Re: Plane-2-only string

2017-11-14 Thread Bobby Tung via Unicode
Hello,

Here's a list of frequently used Han characters for Hakka and Minnan, Chinese 
dialects.

It contains several EXT-B characters that you can test: 

http://bobbytung.github.io/TaigiHakkaIdeograph/ 
<http://bobbytung.github.io/TaigiHakkaIdeograph/>
https://docs.google.com/spreadsheets/d/18CUbZ7tsvZ4QbUj3xcfYi9EGqsft4T37WtUMX9v2STQ/pubhtml
 
<https://docs.google.com/spreadsheets/d/18CUbZ7tsvZ4QbUj3xcfYi9EGqsft4T37WtUMX9v2STQ/pubhtml>


Bobby Tung
W3C invited expert
Editor of CLREQ



> via Unicode <unicode@unicode.org> 於 2017年11月14日 下午1:45 寫道:
> 
> Dear Peter,
> 
> since the Chinese characters below are meaningless in Chinese using them 
> should not be a first choice, as they are meaningless, so gibberish, just not 
> complete gibberish.
> 
> Plane 2 has a fair number of older Chinese characters, so someone with a 
> knowledge of ancient Chinese might well be able make something meaningful. 
> Run a competition in China would be one way to get suggestions, spotting a 
> good suggestion is easier than making one.
> 
> Plane 2 has Cantonese, Vietnamese and Zhuang characters. The number of 
> Cantonese characters is small, so making phrases using only them would be 
> difficult. Both Vietnamese and Zhuang have a much larger number of characters 
> so much easier to make something meaningful.
> 
> The following Zhuang proverb, or saying
> 
> 뭴�풹㐡撘霋��
> 
> "Plant sweet potatoes in the field, and raise pigs in the sty."[lit: house, 
> as the bottom floor of tradional house used for livestock and people live in 
> floor above.]
> 
> However third and eighth characters are not the most common used.
> 
> Regards
> John
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 14.11.2017 06:38, Peter Constable via Unicode wrote:
>> I discussed this with one of my Chinese co-workers, and we came up
>> with the following:
>> 
>> “
>> 欣欤欥欦欧
>> 橒橓橔橕橖
>> 裫裬裭裮裯”
>> 
>> Factors in the choice of characters were:
>> - different radicals
>> - for a given radical, have a sequence of consecutive characters so
>> people get the idea it's not a sentence but just a sequence of
>> characters with related meanings
>> - radical groups increase in complexity
>> 
>> 
>> It's not a sentence that can be read, but there's an obvious pattern,
>> so it's also not completely gibberish.
>> 
>> 
>> Peter
>> 
>> -Original Message-
>> From: James Kass [mailto:jameskass...@gmail.com]
>> Sent: Monday, November 13, 2017 2:29 PM
>> To: Peter Constable <peter...@microsoft.com>
>> Cc: Unicode list <unicode@unicode.org>
>> Subject: Re: Plane-2-only string
>> 
>> Peter Constable wrote,
>> 
>>> We don't want to add BMP characters to the ExtB fonts.
>> 
>> So the sample text would lack punctuation.  Given that the
>> Supplementary Ideographic Plane is composed of rare and historical
>> characters from multiple sources, I suspect that the short answer to
>> Peter's original question is:  "No".
> 



Re: Plane-2-only string

2017-11-13 Thread via Unicode



With over a thousand Zhuang characters, Zhuang would work, though of 
course would not have punctuation.



Of the top of my head something like:-

톸톛昭퓨쾀쿇
톸톨퓨멒얙
컹퓨፹왙෯꽖

In romanised Zhuang:-

Gou bae ranz gyoengqde
gou youq ranz ndaw gwn haeux
aen ranz baihlaeng miz naz

In English:-

I went to their house
I ate a meal in the house
behind the house were paddy fields


A native speaker would of course do much better.


Regards
John Knightley


Re: Plane-2-only string

2017-11-13 Thread Phake Nick via Unicode
Perhaps the http://en.wikipedia.org/wiki/Martian_language should be
considered as a way to construct an example Chinese sentence from
characters that are only within Plane2? Probably coukd be understand by
more people than something Cantonese too


Re: Plane-2-only string

2017-11-13 Thread James Kass via Unicode
Philippe Verdy wrote,

> ... As well the newline don't need any font, it is synthetized by renderers.

It's true that fonts don't need to have glyphs mapped for control
characters, but I'd hesitate to use any control character in a font's
sample text field because of the field's intended use.  But, we are
being moot here since Peter has reminded that the fonts in question
already have some BMP characters mapped, including certain punctuation
characters.

An ExtB font with BMP basic Latin could display the English language
default sample text "The quick brown fox..." with no problem, but a
non-English locale might substitute a default text string which the
font could not support.  So it's probably best to have *something* in
that field respresenting characters the font covers.


Re: Plane-2-only string

2017-11-13 Thread Philippe Verdy via Unicode
Any font would likely map the space (and probably for any CJK font the
ideographic space). As well the newline don't need any font, it is
synthetized by renderers. This could be used to compose some Japanese-like
Aiku with some meaning...

2017-11-13 23:54 GMT+01:00 James Kass via Unicode :

> Peter Constable wrote,
>
> > “
> > 欣欤欥欦欧
> > 橒橓橔橕橖
> > 裫裬裭裮裯”
> >
>
> “ 欣欤欥欦欧 橒橓橔橕橖 裫裬裭裮裯”
>
> It looks good in blocks on four separate lines, but would a typical
> font viewing or comparison tool be expected to break it down into four
> lines?  The pattern is still apparent if displayed on just one line,
> but separating the blocks with spaces or any punctuation would require
> BMP characters in the ExtB font.
>
> “欣欤欥欦欧橒橓橔橕橖裫裬裭裮裯”
>
>


RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
As mentioned in my initial mail, the fonts support the Basic Latin block from 
the BMP.

Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass via 
Unicode
Sent: Monday, November 13, 2017 2:54 PM
To: Unicode list <unicode@unicode.org>
Subject: Re: Plane-2-only string

Peter Constable wrote,

> “
> 欣欤欥欦欧
> 橒橓橔橕橖
> 裫裬裭裮裯”
>

“ 欣欤欥欦欧 橒橓橔橕橖 裫裬裭裮裯”

It looks good in blocks on four separate lines, but would a typical font 
viewing or comparison tool be expected to break it down into four lines?  The 
pattern is still apparent if displayed on just one line, but separating the 
blocks with spaces or any punctuation would require BMP characters in the ExtB 
font.

“欣欤欥欦欧橒橓橔橕橖裫裬裭裮裯”




RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
I discussed this with one of my Chinese co-workers, and we came up with the 
following:

“
欣欤欥欦欧
橒橓橔橕橖
裫裬裭裮裯”

Factors in the choice of characters were:
- different radicals
- for a given radical, have a sequence of consecutive characters so people get 
the idea it's not a sentence but just a sequence of characters with related 
meanings
- radical groups increase in complexity


It's not a sentence that can be read, but there's an obvious pattern, so it's 
also not completely gibberish.


Peter

-Original Message-
From: James Kass [mailto:jameskass...@gmail.com] 
Sent: Monday, November 13, 2017 2:29 PM
To: Peter Constable <peter...@microsoft.com>
Cc: Unicode list <unicode@unicode.org>
Subject: Re: Plane-2-only string

Peter Constable wrote,

> We don't want to add BMP characters to the ExtB fonts.

So the sample text would lack punctuation.  Given that the Supplementary 
Ideographic Plane is composed of rare and historical characters from multiple 
sources, I suspect that the short answer to Peter's original question is:  "No".



Re: Plane-2-only string

2017-11-13 Thread James Kass via Unicode
Peter Constable wrote,

> We don't want to add BMP characters to the ExtB fonts.

So the sample text would lack punctuation.  Given that the
Supplementary Ideographic Plane is composed of rare and historical
characters from multiple sources, I suspect that the short answer to
Peter's original question is:  "No".


RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
Would a typical Chinese speaker be likely to recognize these as used in 
Cantonese? (I wouldn't want to have a font's sample-text string give the 
impression that it's a Cantonese font — unless it were specifically intended 
for Cantonese.)

-Original Message-
From: jenk...@apple.com [mailto:jenk...@apple.com] 
Sent: Monday, November 13, 2017 12:46 PM
To: Peter Constable <peter...@microsoft.com>
Cc: Unicode list <unicode@unicode.org>
Subject: Re: Plane-2-only string

Ʃ ̥ ́ Ӽ Մ ݭ ݹ ந ன ோ ௦ ௽ ఋ ల ు ూ ృ ౓ ౘ ౥ ౷ ౸ ಜ ೏ ೕ ೖ ക ര േ ൈ ൉ ൩ ൯ ർ ൾ ൿ ග ට ඲ ฉ 

That is an example of forty Cantonese-specific characters which are not obscene 
(that I'm aware of) from Extension B. For the curious, I've appended at the 
bottom the full list of 280 for all of Plane 2 which I was able to pull out of 
the Unihan database. I'm sure some enterprising poet can make something out of 
them.

> On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode 
> <unicode@unicode.org> wrote:
> 
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
> 
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
> 
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
> 
> 
> 
> Peter
> 

U+201A9 faan2   (Cant.) to play
U+20325 wu1 wu3 (Cant.) to bow, stoop
U+20341 man3(Cant.) an undesirable situation
U+204FC sip3(Cant.) a wedge; to thrust in
U+20544 nap1(Cant.) 酒Մ, a dimple
U+2076D peng2   (Cant.) to fell, cut; to sweep away
U+20779 gaai3   (Cant.) to cut with a knife or scissors
U+20BA8 naai3   (Cant.) to tie, tow; bring along
U+20BA9 aa1 liu1(Cant.) an interjection; rare, specialized
U+20BCB jai4 jai5   (Cant.) naughty, inferior
U+20BE6 cai3(Cant.) to eat, take a meal
U+20BFD zi1 (Cant.) a final particle indicating affirmation
U+20C0B jaau1   (Cant.) left-handed
U+20C32 eot1(Cant.) to belch
U+20C41 tam3(Cant.) to fool, trick, cheat
U+20C42 dat1(Cant.) to put something or sit wherever one wishes; to 
rebuke, reproach
U+20C43 nip1(Cant.) thin, flat; poor
U+20C53 ngai1   (Cant.) to importune, beg
U+20C58 ngaak6  (Cant.) contrary, opposing, against; disobedient
U+20C65 fik1 jit6 we5   (Cant.) wrangling, a noise; fitful; a soft 
fabric with no body
U+20C77 ming1   (Cant.) small
U+20C78 san2 seon2  (Cant.) phonetic
U+20C9C zaang1  (Cant.) to owe
U+20CCF ce2 ce6 (Cant.) interjection
U+20CD5 caau3   (Cant.) to search
U+20CD6 dap6(Cant.) to strike, pound
U+20D15 miu2(Cant.) to purse the lips; to wriggle
U+20D30 gau6(Cant.) classifier for a piece or lump of something
U+20D47 keu4(Cant.) peculiar, strange
U+20D48 mui2(Cant.) to suck or chew without using the teeth
U+20D49 hong4   (Cant.) hope
U+20D69 go2 (Cant.) that
U+20D6F gwit1 gwit3 (Cant.) onomatopoetic
U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic
U+20D7E waak1   (Cant.) eloquent, sharp-tongued
U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger
U+20D9C zai3(Cant.) to do, work; to be willing
U+20DA7 dim6(Cant.) straight, vertical; OK; to pick up with the 
fingers; verbal aspect marker of successful completion
U+20DB2 gap6 kap6   (Cant.) to stare at; to take a big bite
U+20E09 kak1(Cant.) to block, obstruct
U+20E0A tap1(Cant.) an intensifying particle
U+20E0E naa1(Cant.) and, with
U+20E0F ge2 (Cant.) final particle
U+20E10 kam1(Cant.) to endure, last
U+20E11 soek3   (Cant.) soft, sodden
U+20E12 bou2(Cant.) 生ฒ人, a stranger
U+20E3A ngaak6  (Cant.) contrary, opposing
U+20E6D ko1 (Cant.) to call (Engl. loan-word)
U+20E73 git6(Cant.) thick, viscous, dense
U+20E77 ngo4(Cant.) to speak tirelessly
U+20E78 kam2(Cant.) to cover, close up
U+

Re: Plane-2-only string

2017-11-13 Thread Philippe Verdy via Unicode
2017-11-13 21:48 GMT+01:00 James Kass :

> Peter Constable wrote,
>
> >> May be this test page ?
> >>
> >> http://www.i18nguy.com/unicode/supplementary-test.html
> >
> > Thanks. I’d need to know _at least something_ about what the characters
> > signify, though, to have a sense of whether there’s anything potentially
> > offensive.
>
> The Plane 2 characters on that page appear to be random.
>

That's probable but the authors claim these are common characters. It's
possible they collected statistics from some corpus to find some of the
most widely used characters in Plane 2, without needing to understand what
they would mean if they are put side by side (I had noted already that
there was no punctuation at all, and the exposed collection is too long for
a typical Chinese text, and in fact I would expect the presence of some CJK
punctuations.
May be we could compile a list of Chinese toponyms using these, and select
those that use more than one Plane2 character, then separate these names
using CJK commas and a final CJK full stop.

Some Wikidata or OSM data search could be used to compile such list (I
think these topynyms will more likely be found in Cantonese, or Taiwanese
related sources, using the zh-Hant variant, but note that Wikidata does not
distinguish zh-Hans and zh-Hant as Wikimedia wikis use a transliterator,
but I doubt this transliterator performs transforms with Plane2 characters
which should remain unchanged with most of them kept for both traditional
and simplified use).


Re: Plane-2-only string

2017-11-13 Thread James Kass via Unicode
Peter Constable wrote,

>> May be this test page ?
>>
>> http://www.i18nguy.com/unicode/supplementary-test.html
>
> Thanks. I’d need to know _at least something_ about what the characters
> signify, though, to have a sense of whether there’s anything potentially
> offensive.

The Plane 2 characters on that page appear to be random.



Re: Plane-2-only string

2017-11-13 Thread James Kass via Unicode
Peter Constable wrote,

> We don't want to add BMP characters to the ExtB fonts.

How about Plane 15 or 16, then?


Re: Plane-2-only string

2017-11-13 Thread John H. Jenkins via Unicode
Ʃ ̥ ́ Ӽ Մ ݭ ݹ ந ன ோ ௦ ௽ ఋ ల ు ూ ృ ౓ ౘ ౥ ౷ ౸ ಜ ೏ ೕ ೖ ക ര േ ൈ ൉ ൩ ൯ ർ ൾ ൿ ග ට ඲ ฉ 

That is an example of forty Cantonese-specific characters which are not obscene 
(that I'm aware of) from Extension B. For the curious, I've appended at the 
bottom the full list of 280 for all of Plane 2 which I was able to pull out of 
the Unihan database. I'm sure some enterprising poet can make something out of 
them.

> On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode 
>  wrote:
> 
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
> 
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
> 
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
> 
> 
> 
> Peter
> 

U+201A9 faan2   (Cant.) to play
U+20325 wu1 wu3 (Cant.) to bow, stoop
U+20341 man3(Cant.) an undesirable situation
U+204FC sip3(Cant.) a wedge; to thrust in
U+20544 nap1(Cant.) 酒Մ, a dimple
U+2076D peng2   (Cant.) to fell, cut; to sweep away
U+20779 gaai3   (Cant.) to cut with a knife or scissors
U+20BA8 naai3   (Cant.) to tie, tow; bring along
U+20BA9 aa1 liu1(Cant.) an interjection; rare, specialized
U+20BCB jai4 jai5   (Cant.) naughty, inferior
U+20BE6 cai3(Cant.) to eat, take a meal
U+20BFD zi1 (Cant.) a final particle indicating affirmation
U+20C0B jaau1   (Cant.) left-handed
U+20C32 eot1(Cant.) to belch
U+20C41 tam3(Cant.) to fool, trick, cheat
U+20C42 dat1(Cant.) to put something or sit wherever one wishes; to 
rebuke, reproach
U+20C43 nip1(Cant.) thin, flat; poor
U+20C53 ngai1   (Cant.) to importune, beg
U+20C58 ngaak6  (Cant.) contrary, opposing, against; disobedient
U+20C65 fik1 jit6 we5   (Cant.) wrangling, a noise; fitful; a soft 
fabric with no body
U+20C77 ming1   (Cant.) small
U+20C78 san2 seon2  (Cant.) phonetic
U+20C9C zaang1  (Cant.) to owe
U+20CCF ce2 ce6 (Cant.) interjection
U+20CD5 caau3   (Cant.) to search
U+20CD6 dap6(Cant.) to strike, pound
U+20D15 miu2(Cant.) to purse the lips; to wriggle
U+20D30 gau6(Cant.) classifier for a piece or lump of something
U+20D47 keu4(Cant.) peculiar, strange
U+20D48 mui2(Cant.) to suck or chew without using the teeth
U+20D49 hong4   (Cant.) hope
U+20D69 go2 (Cant.) that
U+20D6F gwit1 gwit3 (Cant.) onomatopoetic
U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic
U+20D7E waak1   (Cant.) eloquent, sharp-tongued
U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger
U+20D9C zai3(Cant.) to do, work; to be willing
U+20DA7 dim6(Cant.) straight, vertical; OK; to pick up with the 
fingers; verbal aspect marker of successful completion
U+20DB2 gap6 kap6   (Cant.) to stare at; to take a big bite
U+20E09 kak1(Cant.) to block, obstruct
U+20E0A tap1(Cant.) an intensifying particle
U+20E0E naa1(Cant.) and, with
U+20E0F ge2 (Cant.) final particle
U+20E10 kam1(Cant.) to endure, last
U+20E11 soek3   (Cant.) soft, sodden
U+20E12 bou2(Cant.) 生ฒ人, a stranger
U+20E3A ngaak6  (Cant.) contrary, opposing
U+20E6D ko1 (Cant.) to call (Engl. loan-word)
U+20E73 git6(Cant.) thick, viscous, dense
U+20E77 ngo4(Cant.) to speak tirelessly
U+20E78 kam2(Cant.) to cover, close up
U+20E7A maai4   (Cant.) verbal aspect marker for comletion or movement 
towards
U+20E7B zam6(Cant.) classifier for smells
U+20E8C gwe1(Cant.) timid
U+20E98 long1 long2 (Cant.) hard to get along with; to rinse, 
spread thin
U+20E9D gaak3   (Cant.) final particle
U+20EA2 gaa1 gaa2   (Cant.) final particle
U+20EAA he3 hi1 (Cant.) in a rush; slovenly
U+20EAB leu1(Cant.) strange, peculiar
U+20EAC he2 (Cant.) final particle
U+20ED7 le4 (Cant.) imperative final particle
U+20ED8  

Re: Plane-2-only string

2017-11-13 Thread James Kass via Unicode
Peter Constable wrote,


On Mon, Nov 13, 2017 at 12:25 PM, Peter Constable
<peter...@microsoft.com> wrote:
> We don't want to add BMP characters to the ExtB fonts.
>
>
> Peter
>
> -Original Message-
> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass 
> via Unicode
> Sent: Monday, November 13, 2017 11:39 AM
> To: Unicode list <unicode@unicode.org>
> Subject: Re: Plane-2-only string
>
> A font's sample text can be used in place of the default "The quick brown 
> fox..." text which is used to illustrate the typeface in applications which 
> support that feature.
>
> One approach would be to find a non-gibberish text string using some Plane 2 
> characters and add the BMP glyphs to the font mapped to the BMP PUA.  Because 
> if only a handful of BMP CJK glyphs were added to the font mapped to their 
> standard code points, the font might need to claim to support BMP CJK (when 
> in fact it does not) in order to display the sample text.  Or, (if standard 
> code points are used) the font might be auto-detected as supporting BMP CJK 
> by some applications, when it doesn't really support that range.
>
> On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode 
> <unicode@unicode.org> wrote:
>> I’m wondering if anyone could come up with a string of 15 to 40 characters 
>> _using only plane 2 characters_ that wouldn’t be gibberish?
>>
>> We are considering adding sample-text strings in some of our fonts. (In 
>> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
>> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, 
>> which have CJK characters from plane 2 only.
>>
>> Background:
>> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun 
>> and MingLiU fonts: the combined glyph count exceeds the number of glyphs 
>> that can be added in a single OpenType font, and so the “ExtB” fonts are 
>> used to contain all of the Plane 2 characters that are supported. For 
>> example, the Simsun font supports 28738 BMP characters, and no plane 2 
>> characters, while Simsun-ExtB supports the Basic Latin block from the BMP 
>> plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so 
>> can’t go into a single font.
>>
>>
>>
>> Peter
>



RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
Thanks for the suggestion. Alas, the fonts don't support that block.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Charlie Ruland 
via Unicode
Sent: Monday, November 13, 2017 12:05 PM
To: unicode@unicode.org
Subject: Re: Plane-2-only string

Many of characters in the CJK Compatibility Ideographs Supplement block are 
quite common Chinese characters, or variants thereof. You could try and build 
Chinese sentences with these characters.


On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter




RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
Thanks. I’d need to know _at least something_ about what the characters 
signify, though, to have a sense of whether there’s anything potentially 
offensive.


Peter

From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe Verdy 
via Unicode
Sent: Monday, November 13, 2017 11:51 AM
To: James Kass <jameskass...@gmail.com>
Cc: Unicode list <unicode@unicode.org>
Subject: Re: Plane-2-only string

May be this test page ?
http://www.i18nguy.com/unicode/supplementary-test.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.i18nguy.com%2Funicode%2Fsupplementary-test.html=02%7C01%7Cpetercon%40microsoft.com%7Ce4a52bf8c69943e825e908d52ad06d02%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636461997049400977=EeoebLU6skgb8lthnSQ3ChDzYCQTuQORcJNnXAYV4Ys%3D=0>


2017-11-13 20:38 GMT+01:00 James Kass via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>>:
A font's sample text can be used in place of the default "The quick
brown fox..." text which is used to illustrate the typeface in
applications which support that feature.

One approach would be to find a non-gibberish text string using some
Plane 2 characters and add the BMP glyphs to the font mapped to the
BMP PUA.  Because if only a handful of BMP CJK glyphs were added to
the font mapped to their standard code points, the font might need to
claim to support BMP CJK (when in fact it does not) in order to
display the sample text.  Or, (if standard code points are used) the
font might be auto-detected as supporting BMP CJK by some
applications, when it doesn't really support that range.

On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter



RE: Plane-2-only string

2017-11-13 Thread Peter Constable via Unicode
We don't want to add BMP characters to the ExtB fonts.


Peter

-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass via 
Unicode
Sent: Monday, November 13, 2017 11:39 AM
To: Unicode list <unicode@unicode.org>
Subject: Re: Plane-2-only string

A font's sample text can be used in place of the default "The quick brown 
fox..." text which is used to illustrate the typeface in applications which 
support that feature.

One approach would be to find a non-gibberish text string using some Plane 2 
characters and add the BMP glyphs to the font mapped to the BMP PUA.  Because 
if only a handful of BMP CJK glyphs were added to the font mapped to their 
standard code points, the font might need to claim to support BMP CJK (when in 
fact it does not) in order to display the sample text.  Or, (if standard code 
points are used) the font might be auto-detected as supporting BMP CJK by some 
applications, when it doesn't really support that range.

On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode 
<unicode@unicode.org> wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter




Re: Plane-2-only string

2017-11-13 Thread Charlie Ruland via Unicode
Many of characters in the CJK Compatibility Ideographs Supplement block 
are quite common Chinese characters, or variants thereof. You could try 
and build Chinese sentences with these characters.



On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote:

I’m wondering if anyone could come up with a string of 15 to 40 characters 
_using only plane 2 characters_ that wouldn’t be gibberish?

We are considering adding sample-text strings in some of our fonts. (In 
OpenType, the ‘name’ table can take sample-text strings using name ID 19.) One 
particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have 
CJK characters from plane 2 only.

Background:
The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
be added in a single OpenType font, and so the “ExtB” fonts are used to contain 
all of the Plane 2 characters that are supported. For example, the Simsun font 
supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB 
supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The 
combined glyph count exceeds 64K, so can’t go into a single font.



Peter




Re: Plane-2-only string

2017-11-13 Thread Philippe Verdy via Unicode
May be this test page ?
http://www.i18nguy.com/unicode/supplementary-test.html


2017-11-13 20:38 GMT+01:00 James Kass via Unicode :

> A font's sample text can be used in place of the default "The quick
> brown fox..." text which is used to illustrate the typeface in
> applications which support that feature.
>
> One approach would be to find a non-gibberish text string using some
> Plane 2 characters and add the BMP glyphs to the font mapped to the
> BMP PUA.  Because if only a handful of BMP CJK glyphs were added to
> the font mapped to their standard code points, the font might need to
> claim to support BMP CJK (when in fact it does not) in order to
> display the sample text.  Or, (if standard code points are used) the
> font might be auto-detected as supporting BMP CJK by some
> applications, when it doesn't really support that range.
>
> On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode
>  wrote:
> > I’m wondering if anyone could come up with a string of 15 to 40
> characters _using only plane 2 characters_ that wouldn’t be gibberish?
> >
> > We are considering adding sample-text strings in some of our fonts. (In
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.)
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts,
> which have CJK characters from plane 2 only.
> >
> > Background:
> > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the
> Simsun and MingLiU fonts: the combined glyph count exceeds the number of
> glyphs that can be added in a single OpenType font, and so the “ExtB” fonts
> are used to contain all of the Plane 2 characters that are supported. For
> example, the Simsun font supports 28738 BMP characters, and no plane 2
> characters, while Simsun-ExtB supports the Basic Latin block from the BMP
> plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so
> can’t go into a single font.
> >
> >
> >
> > Peter
>
>


Re: Plane-2-only string

2017-11-13 Thread James Kass via Unicode
A font's sample text can be used in place of the default "The quick
brown fox..." text which is used to illustrate the typeface in
applications which support that feature.

One approach would be to find a non-gibberish text string using some
Plane 2 characters and add the BMP glyphs to the font mapped to the
BMP PUA.  Because if only a handful of BMP CJK glyphs were added to
the font mapped to their standard code points, the font might need to
claim to support BMP CJK (when in fact it does not) in order to
display the sample text.  Or, (if standard code points are used) the
font might be auto-detected as supporting BMP CJK by some
applications, when it doesn't really support that range.

On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode
 wrote:
> I’m wondering if anyone could come up with a string of 15 to 40 characters 
> _using only plane 2 characters_ that wouldn’t be gibberish?
>
> We are considering adding sample-text strings in some of our fonts. (In 
> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) 
> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which 
> have CJK characters from plane 2 only.
>
> Background:
> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and 
> MingLiU fonts: the combined glyph count exceeds the number of glyphs that can 
> be added in a single OpenType font, and so the “ExtB” fonts are used to 
> contain all of the Plane 2 characters that are supported. For example, the 
> Simsun font supports 28738 BMP characters, and no plane 2 characters, while 
> Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 
> characters. The combined glyph count exceeds 64K, so can’t go into a single 
> font.
>
>
>
> Peter