Re: Plane-2-only string
Hello, Here's a list of frequently used Han characters for Hakka and Minnan, Chinese dialects. It contains several EXT-B characters that you can test: http://bobbytung.github.io/TaigiHakkaIdeograph/ <http://bobbytung.github.io/TaigiHakkaIdeograph/> https://docs.google.com/spreadsheets/d/18CUbZ7tsvZ4QbUj3xcfYi9EGqsft4T37WtUMX9v2STQ/pubhtml <https://docs.google.com/spreadsheets/d/18CUbZ7tsvZ4QbUj3xcfYi9EGqsft4T37WtUMX9v2STQ/pubhtml> Bobby Tung W3C invited expert Editor of CLREQ > via Unicode <unicode@unicode.org> 於 2017年11月14日 下午1:45 寫道: > > Dear Peter, > > since the Chinese characters below are meaningless in Chinese using them > should not be a first choice, as they are meaningless, so gibberish, just not > complete gibberish. > > Plane 2 has a fair number of older Chinese characters, so someone with a > knowledge of ancient Chinese might well be able make something meaningful. > Run a competition in China would be one way to get suggestions, spotting a > good suggestion is easier than making one. > > Plane 2 has Cantonese, Vietnamese and Zhuang characters. The number of > Cantonese characters is small, so making phrases using only them would be > difficult. Both Vietnamese and Zhuang have a much larger number of characters > so much easier to make something meaningful. > > The following Zhuang proverb, or saying > > 뭴�풹㐡撘霋�� > > "Plant sweet potatoes in the field, and raise pigs in the sty."[lit: house, > as the bottom floor of tradional house used for livestock and people live in > floor above.] > > However third and eighth characters are not the most common used. > > Regards > John > > > > > > > > > > > > > > > On 14.11.2017 06:38, Peter Constable via Unicode wrote: >> I discussed this with one of my Chinese co-workers, and we came up >> with the following: >> >> “ >> 欣欤欥欦欧 >> 橒橓橔橕橖 >> 裫裬裭裮裯” >> >> Factors in the choice of characters were: >> - different radicals >> - for a given radical, have a sequence of consecutive characters so >> people get the idea it's not a sentence but just a sequence of >> characters with related meanings >> - radical groups increase in complexity >> >> >> It's not a sentence that can be read, but there's an obvious pattern, >> so it's also not completely gibberish. >> >> >> Peter >> >> -Original Message- >> From: James Kass [mailto:jameskass...@gmail.com] >> Sent: Monday, November 13, 2017 2:29 PM >> To: Peter Constable <peter...@microsoft.com> >> Cc: Unicode list <unicode@unicode.org> >> Subject: Re: Plane-2-only string >> >> Peter Constable wrote, >> >>> We don't want to add BMP characters to the ExtB fonts. >> >> So the sample text would lack punctuation. Given that the >> Supplementary Ideographic Plane is composed of rare and historical >> characters from multiple sources, I suspect that the short answer to >> Peter's original question is: "No". >
Re: Plane-2-only string
With over a thousand Zhuang characters, Zhuang would work, though of course would not have punctuation. Of the top of my head something like:- 톸톛昭퓨쾀쿇 톸톨퓨멒얙 컹퓨፹왙෯꽖 In romanised Zhuang:- Gou bae ranz gyoengqde gou youq ranz ndaw gwn haeux aen ranz baihlaeng miz naz In English:- I went to their house I ate a meal in the house behind the house were paddy fields A native speaker would of course do much better. Regards John Knightley
Re: Plane-2-only string
Perhaps the http://en.wikipedia.org/wiki/Martian_language should be considered as a way to construct an example Chinese sentence from characters that are only within Plane2? Probably coukd be understand by more people than something Cantonese too
Re: Plane-2-only string
Philippe Verdy wrote, > ... As well the newline don't need any font, it is synthetized by renderers. It's true that fonts don't need to have glyphs mapped for control characters, but I'd hesitate to use any control character in a font's sample text field because of the field's intended use. But, we are being moot here since Peter has reminded that the fonts in question already have some BMP characters mapped, including certain punctuation characters. An ExtB font with BMP basic Latin could display the English language default sample text "The quick brown fox..." with no problem, but a non-English locale might substitute a default text string which the font could not support. So it's probably best to have *something* in that field respresenting characters the font covers.
Re: Plane-2-only string
Any font would likely map the space (and probably for any CJK font the ideographic space). As well the newline don't need any font, it is synthetized by renderers. This could be used to compose some Japanese-like Aiku with some meaning... 2017-11-13 23:54 GMT+01:00 James Kass via Unicode: > Peter Constable wrote, > > > “ > > 欣欤欥欦欧 > > 橒橓橔橕橖 > > 裫裬裭裮裯” > > > > “ 欣欤欥欦欧 橒橓橔橕橖 裫裬裭裮裯” > > It looks good in blocks on four separate lines, but would a typical > font viewing or comparison tool be expected to break it down into four > lines? The pattern is still apparent if displayed on just one line, > but separating the blocks with spaces or any punctuation would require > BMP characters in the ExtB font. > > “ 欣欤欥欦欧橒橓橔橕橖裫裬裭裮裯” > >
RE: Plane-2-only string
As mentioned in my initial mail, the fonts support the Basic Latin block from the BMP. Peter -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass via Unicode Sent: Monday, November 13, 2017 2:54 PM To: Unicode list <unicode@unicode.org> Subject: Re: Plane-2-only string Peter Constable wrote, > “ > 欣欤欥欦欧 > 橒橓橔橕橖 > 裫裬裭裮裯” > “ 欣欤欥欦欧 橒橓橔橕橖 裫裬裭裮裯” It looks good in blocks on four separate lines, but would a typical font viewing or comparison tool be expected to break it down into four lines? The pattern is still apparent if displayed on just one line, but separating the blocks with spaces or any punctuation would require BMP characters in the ExtB font. “ 欣欤欥欦欧橒橓橔橕橖裫裬裭裮裯”
RE: Plane-2-only string
I discussed this with one of my Chinese co-workers, and we came up with the following: “ 欣欤欥欦欧 橒橓橔橕橖 裫裬裭裮裯” Factors in the choice of characters were: - different radicals - for a given radical, have a sequence of consecutive characters so people get the idea it's not a sentence but just a sequence of characters with related meanings - radical groups increase in complexity It's not a sentence that can be read, but there's an obvious pattern, so it's also not completely gibberish. Peter -Original Message- From: James Kass [mailto:jameskass...@gmail.com] Sent: Monday, November 13, 2017 2:29 PM To: Peter Constable <peter...@microsoft.com> Cc: Unicode list <unicode@unicode.org> Subject: Re: Plane-2-only string Peter Constable wrote, > We don't want to add BMP characters to the ExtB fonts. So the sample text would lack punctuation. Given that the Supplementary Ideographic Plane is composed of rare and historical characters from multiple sources, I suspect that the short answer to Peter's original question is: "No".
Re: Plane-2-only string
Peter Constable wrote, > We don't want to add BMP characters to the ExtB fonts. So the sample text would lack punctuation. Given that the Supplementary Ideographic Plane is composed of rare and historical characters from multiple sources, I suspect that the short answer to Peter's original question is: "No".
RE: Plane-2-only string
Would a typical Chinese speaker be likely to recognize these as used in Cantonese? (I wouldn't want to have a font's sample-text string give the impression that it's a Cantonese font — unless it were specifically intended for Cantonese.) -Original Message- From: jenk...@apple.com [mailto:jenk...@apple.com] Sent: Monday, November 13, 2017 12:46 PM To: Peter Constable <peter...@microsoft.com> Cc: Unicode list <unicode@unicode.org> Subject: Re: Plane-2-only string Ʃ ̥ ́ Ӽ Մ ݭ ݹ ந ன ோ ௦ ఋ ల ు ూ ృ ౘ ౷ ౸ ಜ ೕ ೖ ക ര േ ൈ ൩ ൯ ർ ൾ ൿ ග ට ฉ That is an example of forty Cantonese-specific characters which are not obscene (that I'm aware of) from Extension B. For the curious, I've appended at the bottom the full list of 280 for all of Plane 2 which I was able to pull out of the Unihan database. I'm sure some enterprising poet can make something out of them. > On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode > <unicode@unicode.org> wrote: > > I’m wondering if anyone could come up with a string of 15 to 40 characters > _using only plane 2 characters_ that wouldn’t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which > have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and > MingLiU fonts: the combined glyph count exceeds the number of glyphs that can > be added in a single OpenType font, and so the “ExtB” fonts are used to > contain all of the Plane 2 characters that are supported. For example, the > Simsun font supports 28738 BMP characters, and no plane 2 characters, while > Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 > characters. The combined glyph count exceeds 64K, so can’t go into a single > font. > > > > Peter > U+201A9 faan2 (Cant.) to play U+20325 wu1 wu3 (Cant.) to bow, stoop U+20341 man3(Cant.) an undesirable situation U+204FC sip3(Cant.) a wedge; to thrust in U+20544 nap1(Cant.) 酒Մ, a dimple U+2076D peng2 (Cant.) to fell, cut; to sweep away U+20779 gaai3 (Cant.) to cut with a knife or scissors U+20BA8 naai3 (Cant.) to tie, tow; bring along U+20BA9 aa1 liu1(Cant.) an interjection; rare, specialized U+20BCB jai4 jai5 (Cant.) naughty, inferior U+20BE6 cai3(Cant.) to eat, take a meal U+20BFD zi1 (Cant.) a final particle indicating affirmation U+20C0B jaau1 (Cant.) left-handed U+20C32 eot1(Cant.) to belch U+20C41 tam3(Cant.) to fool, trick, cheat U+20C42 dat1(Cant.) to put something or sit wherever one wishes; to rebuke, reproach U+20C43 nip1(Cant.) thin, flat; poor U+20C53 ngai1 (Cant.) to importune, beg U+20C58 ngaak6 (Cant.) contrary, opposing, against; disobedient U+20C65 fik1 jit6 we5 (Cant.) wrangling, a noise; fitful; a soft fabric with no body U+20C77 ming1 (Cant.) small U+20C78 san2 seon2 (Cant.) phonetic U+20C9C zaang1 (Cant.) to owe U+20CCF ce2 ce6 (Cant.) interjection U+20CD5 caau3 (Cant.) to search U+20CD6 dap6(Cant.) to strike, pound U+20D15 miu2(Cant.) to purse the lips; to wriggle U+20D30 gau6(Cant.) classifier for a piece or lump of something U+20D47 keu4(Cant.) peculiar, strange U+20D48 mui2(Cant.) to suck or chew without using the teeth U+20D49 hong4 (Cant.) hope U+20D69 go2 (Cant.) that U+20D6F gwit1 gwit3 (Cant.) onomatopoetic U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic U+20D7E waak1 (Cant.) eloquent, sharp-tongued U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger U+20D9C zai3(Cant.) to do, work; to be willing U+20DA7 dim6(Cant.) straight, vertical; OK; to pick up with the fingers; verbal aspect marker of successful completion U+20DB2 gap6 kap6 (Cant.) to stare at; to take a big bite U+20E09 kak1(Cant.) to block, obstruct U+20E0A tap1(Cant.) an intensifying particle U+20E0E naa1(Cant.) and, with U+20E0F ge2 (Cant.) final particle U+20E10 kam1(Cant.) to endure, last U+20E11 soek3 (Cant.) soft, sodden U+20E12 bou2(Cant.) 生ฒ人, a stranger U+20E3A ngaak6 (Cant.) contrary, opposing U+20E6D ko1 (Cant.) to call (Engl. loan-word) U+20E73 git6(Cant.) thick, viscous, dense U+20E77 ngo4(Cant.) to speak tirelessly U+20E78 kam2(Cant.) to cover, close up U+
Re: Plane-2-only string
2017-11-13 21:48 GMT+01:00 James Kass: > Peter Constable wrote, > > >> May be this test page ? > >> > >> http://www.i18nguy.com/unicode/supplementary-test.html > > > > Thanks. I’d need to know _at least something_ about what the characters > > signify, though, to have a sense of whether there’s anything potentially > > offensive. > > The Plane 2 characters on that page appear to be random. > That's probable but the authors claim these are common characters. It's possible they collected statistics from some corpus to find some of the most widely used characters in Plane 2, without needing to understand what they would mean if they are put side by side (I had noted already that there was no punctuation at all, and the exposed collection is too long for a typical Chinese text, and in fact I would expect the presence of some CJK punctuations. May be we could compile a list of Chinese toponyms using these, and select those that use more than one Plane2 character, then separate these names using CJK commas and a final CJK full stop. Some Wikidata or OSM data search could be used to compile such list (I think these topynyms will more likely be found in Cantonese, or Taiwanese related sources, using the zh-Hant variant, but note that Wikidata does not distinguish zh-Hans and zh-Hant as Wikimedia wikis use a transliterator, but I doubt this transliterator performs transforms with Plane2 characters which should remain unchanged with most of them kept for both traditional and simplified use).
Re: Plane-2-only string
Peter Constable wrote, >> May be this test page ? >> >> http://www.i18nguy.com/unicode/supplementary-test.html > > Thanks. I’d need to know _at least something_ about what the characters > signify, though, to have a sense of whether there’s anything potentially > offensive. The Plane 2 characters on that page appear to be random.
Re: Plane-2-only string
Peter Constable wrote, > We don't want to add BMP characters to the ExtB fonts. How about Plane 15 or 16, then?
Re: Plane-2-only string
Ʃ ̥ ́ Ӽ Մ ݭ ݹ ந ன ோ ௦ ఋ ల ు ూ ృ ౘ ౷ ౸ ಜ ೕ ೖ ക ര േ ൈ ൩ ൯ ർ ൾ ൿ ග ට ฉ That is an example of forty Cantonese-specific characters which are not obscene (that I'm aware of) from Extension B. For the curious, I've appended at the bottom the full list of 280 for all of Plane 2 which I was able to pull out of the Unihan database. I'm sure some enterprising poet can make something out of them. > On Nov 13, 2017, at 11:20 AM, Peter Constable via Unicode >wrote: > > I’m wondering if anyone could come up with a string of 15 to 40 characters > _using only plane 2 characters_ that wouldn’t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which > have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and > MingLiU fonts: the combined glyph count exceeds the number of glyphs that can > be added in a single OpenType font, and so the “ExtB” fonts are used to > contain all of the Plane 2 characters that are supported. For example, the > Simsun font supports 28738 BMP characters, and no plane 2 characters, while > Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 > characters. The combined glyph count exceeds 64K, so can’t go into a single > font. > > > > Peter > U+201A9 faan2 (Cant.) to play U+20325 wu1 wu3 (Cant.) to bow, stoop U+20341 man3(Cant.) an undesirable situation U+204FC sip3(Cant.) a wedge; to thrust in U+20544 nap1(Cant.) 酒Մ, a dimple U+2076D peng2 (Cant.) to fell, cut; to sweep away U+20779 gaai3 (Cant.) to cut with a knife or scissors U+20BA8 naai3 (Cant.) to tie, tow; bring along U+20BA9 aa1 liu1(Cant.) an interjection; rare, specialized U+20BCB jai4 jai5 (Cant.) naughty, inferior U+20BE6 cai3(Cant.) to eat, take a meal U+20BFD zi1 (Cant.) a final particle indicating affirmation U+20C0B jaau1 (Cant.) left-handed U+20C32 eot1(Cant.) to belch U+20C41 tam3(Cant.) to fool, trick, cheat U+20C42 dat1(Cant.) to put something or sit wherever one wishes; to rebuke, reproach U+20C43 nip1(Cant.) thin, flat; poor U+20C53 ngai1 (Cant.) to importune, beg U+20C58 ngaak6 (Cant.) contrary, opposing, against; disobedient U+20C65 fik1 jit6 we5 (Cant.) wrangling, a noise; fitful; a soft fabric with no body U+20C77 ming1 (Cant.) small U+20C78 san2 seon2 (Cant.) phonetic U+20C9C zaang1 (Cant.) to owe U+20CCF ce2 ce6 (Cant.) interjection U+20CD5 caau3 (Cant.) to search U+20CD6 dap6(Cant.) to strike, pound U+20D15 miu2(Cant.) to purse the lips; to wriggle U+20D30 gau6(Cant.) classifier for a piece or lump of something U+20D47 keu4(Cant.) peculiar, strange U+20D48 mui2(Cant.) to suck or chew without using the teeth U+20D49 hong4 (Cant.) hope U+20D69 go2 (Cant.) that U+20D6F gwit1 gwit3 (Cant.) onomatopoetic U+20D7C mang1 mang4 (Cant.) scars on the eyelid; phonetic U+20D7E waak1 (Cant.) eloquent, sharp-tongued U+20D7F pe1 pe5 (Cant.) a pair (from the Engl.); to stagger U+20D9C zai3(Cant.) to do, work; to be willing U+20DA7 dim6(Cant.) straight, vertical; OK; to pick up with the fingers; verbal aspect marker of successful completion U+20DB2 gap6 kap6 (Cant.) to stare at; to take a big bite U+20E09 kak1(Cant.) to block, obstruct U+20E0A tap1(Cant.) an intensifying particle U+20E0E naa1(Cant.) and, with U+20E0F ge2 (Cant.) final particle U+20E10 kam1(Cant.) to endure, last U+20E11 soek3 (Cant.) soft, sodden U+20E12 bou2(Cant.) 生ฒ人, a stranger U+20E3A ngaak6 (Cant.) contrary, opposing U+20E6D ko1 (Cant.) to call (Engl. loan-word) U+20E73 git6(Cant.) thick, viscous, dense U+20E77 ngo4(Cant.) to speak tirelessly U+20E78 kam2(Cant.) to cover, close up U+20E7A maai4 (Cant.) verbal aspect marker for comletion or movement towards U+20E7B zam6(Cant.) classifier for smells U+20E8C gwe1(Cant.) timid U+20E98 long1 long2 (Cant.) hard to get along with; to rinse, spread thin U+20E9D gaak3 (Cant.) final particle U+20EA2 gaa1 gaa2 (Cant.) final particle U+20EAA he3 hi1 (Cant.) in a rush; slovenly U+20EAB leu1(Cant.) strange, peculiar U+20EAC he2 (Cant.) final particle U+20ED7 le4 (Cant.) imperative final particle U+20ED8
Re: Plane-2-only string
Peter Constable wrote, On Mon, Nov 13, 2017 at 12:25 PM, Peter Constable <peter...@microsoft.com> wrote: > We don't want to add BMP characters to the ExtB fonts. > > > Peter > > -Original Message- > From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass > via Unicode > Sent: Monday, November 13, 2017 11:39 AM > To: Unicode list <unicode@unicode.org> > Subject: Re: Plane-2-only string > > A font's sample text can be used in place of the default "The quick brown > fox..." text which is used to illustrate the typeface in applications which > support that feature. > > One approach would be to find a non-gibberish text string using some Plane 2 > characters and add the BMP glyphs to the font mapped to the BMP PUA. Because > if only a handful of BMP CJK glyphs were added to the font mapped to their > standard code points, the font might need to claim to support BMP CJK (when > in fact it does not) in order to display the sample text. Or, (if standard > code points are used) the font might be auto-detected as supporting BMP CJK > by some applications, when it doesn't really support that range. > > On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode > <unicode@unicode.org> wrote: >> I’m wondering if anyone could come up with a string of 15 to 40 characters >> _using only plane 2 characters_ that wouldn’t be gibberish? >> >> We are considering adding sample-text strings in some of our fonts. (In >> OpenType, the ‘name’ table can take sample-text strings using name ID 19.) >> One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, >> which have CJK characters from plane 2 only. >> >> Background: >> The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun >> and MingLiU fonts: the combined glyph count exceeds the number of glyphs >> that can be added in a single OpenType font, and so the “ExtB” fonts are >> used to contain all of the Plane 2 characters that are supported. For >> example, the Simsun font supports 28738 BMP characters, and no plane 2 >> characters, while Simsun-ExtB supports the Basic Latin block from the BMP >> plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so >> can’t go into a single font. >> >> >> >> Peter >
RE: Plane-2-only string
Thanks for the suggestion. Alas, the fonts don't support that block. Peter -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Charlie Ruland via Unicode Sent: Monday, November 13, 2017 12:05 PM To: unicode@unicode.org Subject: Re: Plane-2-only string Many of characters in the CJK Compatibility Ideographs Supplement block are quite common Chinese characters, or variants thereof. You could try and build Chinese sentences with these characters. On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote: > I’m wondering if anyone could come up with a string of 15 to 40 characters > _using only plane 2 characters_ that wouldn’t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which > have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and > MingLiU fonts: the combined glyph count exceeds the number of glyphs that can > be added in a single OpenType font, and so the “ExtB” fonts are used to > contain all of the Plane 2 characters that are supported. For example, the > Simsun font supports 28738 BMP characters, and no plane 2 characters, while > Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 > characters. The combined glyph count exceeds 64K, so can’t go into a single > font. > > > > Peter
RE: Plane-2-only string
Thanks. I’d need to know _at least something_ about what the characters signify, though, to have a sense of whether there’s anything potentially offensive. Peter From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe Verdy via Unicode Sent: Monday, November 13, 2017 11:51 AM To: James Kass <jameskass...@gmail.com> Cc: Unicode list <unicode@unicode.org> Subject: Re: Plane-2-only string May be this test page ? http://www.i18nguy.com/unicode/supplementary-test.html<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.i18nguy.com%2Funicode%2Fsupplementary-test.html=02%7C01%7Cpetercon%40microsoft.com%7Ce4a52bf8c69943e825e908d52ad06d02%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636461997049400977=EeoebLU6skgb8lthnSQ3ChDzYCQTuQORcJNnXAYV4Ys%3D=0> 2017-11-13 20:38 GMT+01:00 James Kass via Unicode <unicode@unicode.org<mailto:unicode@unicode.org>>: A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode <unicode@unicode.org<mailto:unicode@unicode.org>> wrote: > I’m wondering if anyone could come up with a string of 15 to 40 characters > _using only plane 2 characters_ that wouldn’t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which > have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and > MingLiU fonts: the combined glyph count exceeds the number of glyphs that can > be added in a single OpenType font, and so the “ExtB” fonts are used to > contain all of the Plane 2 characters that are supported. For example, the > Simsun font supports 28738 BMP characters, and no plane 2 characters, while > Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 > characters. The combined glyph count exceeds 64K, so can’t go into a single > font. > > > > Peter
RE: Plane-2-only string
We don't want to add BMP characters to the ExtB fonts. Peter -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of James Kass via Unicode Sent: Monday, November 13, 2017 11:39 AM To: Unicode list <unicode@unicode.org> Subject: Re: Plane-2-only string A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode <unicode@unicode.org> wrote: > I’m wondering if anyone could come up with a string of 15 to 40 characters > _using only plane 2 characters_ that wouldn’t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which > have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and > MingLiU fonts: the combined glyph count exceeds the number of glyphs that can > be added in a single OpenType font, and so the “ExtB” fonts are used to > contain all of the Plane 2 characters that are supported. For example, the > Simsun font supports 28738 BMP characters, and no plane 2 characters, while > Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 > characters. The combined glyph count exceeds 64K, so can’t go into a single > font. > > > > Peter
Re: Plane-2-only string
Many of characters in the CJK Compatibility Ideographs Supplement block are quite common Chinese characters, or variants thereof. You could try and build Chinese sentences with these characters. On Mon, 13 Nov 2017 at 20:20 GMT+01:00 Peter Constable via Unicode wrote: I’m wondering if anyone could come up with a string of 15 to 40 characters _using only plane 2 characters_ that wouldn’t be gibberish? We are considering adding sample-text strings in some of our fonts. (In OpenType, the ‘name’ table can take sample-text strings using name ID 19.) One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which have CJK characters from plane 2 only. Background: The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and MingLiU fonts: the combined glyph count exceeds the number of glyphs that can be added in a single OpenType font, and so the “ExtB” fonts are used to contain all of the Plane 2 characters that are supported. For example, the Simsun font supports 28738 BMP characters, and no plane 2 characters, while Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so can’t go into a single font. Peter
Re: Plane-2-only string
May be this test page ? http://www.i18nguy.com/unicode/supplementary-test.html 2017-11-13 20:38 GMT+01:00 James Kass via Unicode: > A font's sample text can be used in place of the default "The quick > brown fox..." text which is used to illustrate the typeface in > applications which support that feature. > > One approach would be to find a non-gibberish text string using some > Plane 2 characters and add the BMP glyphs to the font mapped to the > BMP PUA. Because if only a handful of BMP CJK glyphs were added to > the font mapped to their standard code points, the font might need to > claim to support BMP CJK (when in fact it does not) in order to > display the sample text. Or, (if standard code points are used) the > font might be auto-detected as supporting BMP CJK by some > applications, when it doesn't really support that range. > > On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicode > wrote: > > I’m wondering if anyone could come up with a string of 15 to 40 > characters _using only plane 2 characters_ that wouldn’t be gibberish? > > > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, > which have CJK characters from plane 2 only. > > > > Background: > > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the > Simsun and MingLiU fonts: the combined glyph count exceeds the number of > glyphs that can be added in a single OpenType font, and so the “ExtB” fonts > are used to contain all of the Plane 2 characters that are supported. For > example, the Simsun font supports 28738 BMP characters, and no plane 2 > characters, while Simsun-ExtB supports the Basic Latin block from the BMP > plus 47,293 plane 2 characters. The combined glyph count exceeds 64K, so > can’t go into a single font. > > > > > > > > Peter > >
Re: Plane-2-only string
A font's sample text can be used in place of the default "The quick brown fox..." text which is used to illustrate the typeface in applications which support that feature. One approach would be to find a non-gibberish text string using some Plane 2 characters and add the BMP glyphs to the font mapped to the BMP PUA. Because if only a handful of BMP CJK glyphs were added to the font mapped to their standard code points, the font might need to claim to support BMP CJK (when in fact it does not) in order to display the sample text. Or, (if standard code points are used) the font might be auto-detected as supporting BMP CJK by some applications, when it doesn't really support that range. On Mon, Nov 13, 2017 at 10:20 AM, Peter Constable via Unicodewrote: > I’m wondering if anyone could come up with a string of 15 to 40 characters > _using only plane 2 characters_ that wouldn’t be gibberish? > > We are considering adding sample-text strings in some of our fonts. (In > OpenType, the ‘name’ table can take sample-text strings using name ID 19.) > One particular issue we have is the Simsun-ExtB and MingLiU-ExtB fonts, which > have CJK characters from plane 2 only. > > Background: > The Simsun-ExtB and MingLiU-ExtB fonts are meant to complement the Simsun and > MingLiU fonts: the combined glyph count exceeds the number of glyphs that can > be added in a single OpenType font, and so the “ExtB” fonts are used to > contain all of the Plane 2 characters that are supported. For example, the > Simsun font supports 28738 BMP characters, and no plane 2 characters, while > Simsun-ExtB supports the Basic Latin block from the BMP plus 47,293 plane 2 > characters. The combined glyph count exceeds 64K, so can’t go into a single > font. > > > > Peter