Re: Encoding italic

2019-01-15 Thread James Kass via Unicode



Responding to David Starner,

> I might complain about the people who claim to like plain text yet would
> only be happy with massive changes to it, though.

Most movie lovers welcomed talkies.

People are free to cling to their rotary phones as long as they like.  
They just can't press the pound sign.


> However, plain text can be used standalone, and it can be used inside
> programs and other formats.

That remains true even post-emoji.  How would italics change that?

> Dismissing the people who use Unicode in ways that aren't plain text
> is unfair and hurts your case.

It wasn't my intention to be dismissive, much, so point taken. 
Discussions like this one exist so that people can express concerns and 
share ideas towards resolutions.


> Adding italics to Unicode will complicate the implementation of all rich
> text applications that currently support italics.

Would there be any advantages to rich-text apps if italics were added to 
Unicode?  Is there any cost/benefit data?  You've made an assertion 
about complication to rich-text apps which I can neither confirm nor refute.


One possible advantage would be interoperability.  People snagging 
snippets of text from web pages or word processors and dropping data 
into their plain-text windows wouldn't be bamboozled by the unexpected.  
If computer text is getting exchanged, isn't it better when it can be 
done in a standard fashion?




Re: Encoding italic

2019-01-15 Thread David Starner via Unicode
On Tue, Jan 15, 2019 at 5:17 PM James Kass via Unicode 
wrote:

> Enabling plain-text doesn't make rich-text poor.
>

Adding italics to Unicode will complicate the implementation of all rich
text applications that currently support italics.


> People who regard plain-text with derision, disdain, or contempt have
> every right to hold and share opinions about what plain-text is *for*
> and in which direction it should be heading.  Such opinions should
> receive all the consideration they deserve.
>

Really? There's no one here regards plain text with derision, disdain or
contempt. I might complain about the people who claim to like plain text
yet would only be happy with massive changes to it, though.

However, plain text can be used standalone, and it can be used inside
programs and other formats. Dismissing the people who use Unicode in ways
that aren't plain text is unfair and hurts your case.


Re: Encoding italic (was: A last missing link)

2019-01-15 Thread James Kass via Unicode



Victor Gaultney wrote,

> Use of variation selectors, a single character modifier, or combining
> characters also seem to be less useful options, as they act at the 
individual
> character level and are highly impractical. They also violate the key 
concept
> that italics are a way of marking a span of text as 'special' - not 
individual
> letters. Matched punctuation works the same way and is a good fit for 
italic.


The VS possibility would double the character count of any strings 
including them.  That may make it undesirable for groups like Twitter 
who have limits.  But math (mis)use doesn't affect the character count.  
If the VS method were to be used, the math alphanumerics might continue 
to be used where possible, at least by Twitter users who already employ 
the math-alphas to make their corpus of legacy data.


Using VS arose in the parent thread as a way of avoiding the necessity 
of adding additional characters to the standard.  (But we don't seem to 
be running out of available code space.)  The purpose of VS is to 
preserve variant letter form distinctions in plain-text, which seems to 
apply to italics.  Further, VS is an existing mechanism which wouldn't 
be expected to impact searching and so forth on savvy systems.  (An 
opening/closing pair of control characters also shouldn't impact 
searching.)  Finally, VS already works in existing technology and there 
wouldn't be a long down-time waiting for updates to the standard and 
implementation of same. (Not that we should rush to judgment or 
"solutions" here, just that an ad-hoc "solution" is possible and could 
be implemented by third-parties.)


Concerns about statefulness in plain-text exist.  Treating "italic" as 
an opening/closing "punctuation" may help get around such concerns.  
IIRC, it was proposed that the Egyptian cartouche be handled that way.


Like emoji, people who don't like italics in plain text don't have to 
use them.




Re: Encoding italic

2019-01-15 Thread Marcel Schneider via Unicode

On 16/01/2019 02:15, James Kass via Unicode wrote:


Enabling plain-text doesn't make rich-text poor.

People who regard plain-text with derision, disdain, or contempt have
every right to hold and share opinions about what plain-text is *for*
and in which direction it should be heading.  Such opinions should
receive all the consideration they deserve.


Perhaps there’s a need to sort out what plain text is thought to be
across different user communities. Sometimes “plain text” is just a
synonym for _draft style_, considering that a worker should not need
to follow any style guide, because (a) many normal keyboards don’t
enable users to do so, and (b) the process is too complicated using
mainstream extended keyboard layouts.

From this point of view, any demand to key in directly a text in a
locale’s accurate digital representation is likely to be considered
an unreachable challenge and thus, an offense.

But indeed, people are entitled not to screw down their requirements
as of what text is supposed to look like. From that POV, draft style
is unbearable, and being bound to it is then the actual offense.

The first step would then be to beef up that draft style so that it
integrates all characters needed for a fully featured representation
of a locale’s language, from curly quotes to preformatted superscript.
Unicode makes it possible, in the straight line of what was set up
in ISO/IEC 6937. The next step is to design appropriate input methods.
Today, we can even get back the u̲n̲d̲e̲r̲l̲i̲n̲e̲ that we were deprived of,
by adding an appropriate dead key or combining diacritic, but that’s
still experimental. It already works better, though, than the Unicode
Syriac abbreviation control, whose overline is *not* rendered in
Chrome on Linux, The same way, Unicode could encode a Latin italic
control, or as Victor Gaultney proposes, a Latin italic start control
and a Latin italic end control, directing the rendering engine to
pick italics instead of drawing a linie along the rest of the word.

However, the discussion about Fraktur typefaces in the parent thread
made clear that reasoning in terms of roman vs italic is not really
interoperable, because in Roman typefaces, italic is polysemic, as
it’s used both for foreign words and for stress, while in Fraktur,
stress is denoted by spacing out, and foreign words, by using roman.
That would require a start and end pair of both Latin foreign word
controls and Latin stress controls.

As we see it from here, that would be even less implemented than
the Syriac abbreviation format control. It might be considered
Unicode conformant, since it would be part of the interoperable
digital representation of Latin script using languages, and its
use could be extended to other scripts.

But that is *not* what I’m asking for. First, we aren’t writing
in Fraktur any more, at least not in France nor in any other
language using preformatted superscript abbreviation indicators.
And second, if we need a document for full-fleshed publishing,
we can use LaTeX or InDesign.

What I’m asking for is simply that people are enabled to write
in their language in a decent manner and can use that text in
any environment without postprocessing *and* without looking
downright bad.

That might please even those who are looking at draft style
with disdain.


Best regards,

Marcel


Re: Encoding italic

2019-01-15 Thread James Kass via Unicode



Enabling plain-text doesn't make rich-text poor.

People who regard plain-text with derision, disdain, or contempt have 
every right to hold and share opinions about what plain-text is *for* 
and in which direction it should be heading.  Such opinions should 
receive all the consideration they deserve.




Re: Encoding italic (was: A last missing link)

2019-01-15 Thread David Starner via Unicode
On Tue, Jan 15, 2019 at 1:47 PM James Kass via Unicode 
wrote:

>
> Although there probably isn't really any concerted effort to "keep
> plain-text mediocre", it can sometimes seem that way.
>

Dennis Ritchie allegedly replied to requests for new features in C with “If
you want PL/I, you know where to find it.” C is still an austere language,
and still well used, with users who want C++ or Java knowing where to find
them. If you want all the features of rich text, use rich text.

Avant-garde enthusiasts are on the leading edge by definition. That's
> why they're known as trend setters.  Unicode exists because
> forward-looking people envisioned it and worked to make it happen.
> Regardless of one's perception of exuberance, Unicode turned out to be
> so much more than a fringe benefit.
>

Unicode exists because large corporations wanted to sell computers to users
around the world, and found supporting a million different character sets
was costly and buggy, and that users wanted to mix scripts in ways that a
single character set didn't support and ISO 2022 and similar solutions just
weren't cutting it.

That's a clear user story. People can use italics on computers without
problem. Twitter has chosen not to support italics on their platform, which
users have found hacky work-arounds for. That's not such a clear user
story; shouldn't Twitter add support for italics instead of changing every
system in the world?


Re: A last missing link for interoperable representation

2019-01-15 Thread Marcel Schneider via Unicode

On 15/01/2019 13:25, Philippe Verdy via Unicode wrote:


Note that even if this NNBSP character is not mapped in a font, it
should be rendered correctly with all modern renderers (the mapping
is necessary only when a font design wants to tune its metrics,
because its width varies between 1/8 and 1/6 em (the narrow space is
a bit narrower in traditional English typography than in French, so
typical English design set it at about 1/8 em, typical French design
set it at 1/6 em, and neutral fonts may set it somewhere in the
middle); the measure in em may however vary with some fonts (notably
those using "narrow" or "wide" letters by default (because the font
size in em indicates only its height) and in decorated/cursive styles
(e.g. fonts with swashes need a higher line gap, the font design of
the em size may be smaller than for modern simplified styles for
display).

But a renderer should have no problem using a default metric for all
whitespace characters, that actually don't need any glyph to be
drawn: All what is needed is metrics, everything else, inclusing
character properties like breaking are infered by the renderer
independantly of the font and other per-language tuning, or controled
by styling effects applied on top of the font


Indeed, since every Unicode implementation must rely on the character
properties, and given keeping this library up-to-date is straightforward
and easy, there is really no point in displaying a .notdef box in lieu
of whatever whitespace.

As a consequence, prior to assessing the impact of the group separator
migration from (wrong)  to (correct)  on implementations
and interoperability, Unicode would be well advised to start assessing
the impact of implementations (and, of course, the backing vendors) on
correct rendering of , and on the related usability and
interoperability of the digital representation of those many locales
that should rely on .



A renderer may expand the kerning/approach if needed for example to
generate "hollow" or "shadow" effects, or to generate synthetic
weights, including with "variable" fonts support, typically the
renderer will base the metrics of all missing/unmapped whitespaces
from the metrics given to the normal SPACE or NBSP which are
typically both mapped to the same glyph; NNBSP will be synthetized
easily using half the advance width of SPACE, and it's fine;
renderers can also synthetize all other whitespaces for ideographic
usages, or will adapt the rendering if instructed to synthetize a
monospaced variant: here there's a choice for NNBSP to be rendered
like NBSP, typically for French as it is normally a bit wider, or as
a zero-width space like in English, or contextually for example
zero-width near punctuations or NBSP between letters/digits).


In a monospaced font, NNBSP has normally the width of a character,
but it has been designed for proportional fonts, and there, it must
not have the width of a digit, as that would annihilate the required
effect. The group separator must never have the width of a full digit,
not even of digit 1 in variable-width digits, but just a slight gap
ensuring correct readability, BTW also after the decimal separator
as per ISO 8.

Between punctuation,  mustn’t be zero-wide, as it is used in
English to separate closing single and double quotation marks when
a nested quotation ends the first level quotation. I don’t think
that English does use  elsewhere around punctuation except
dashes if appropriate according to the applied style manual, but
Canadian French does, unlike an urban legend saying it doesn’t.
It does only prefer not to space off punctuation *if*  is
unavailable. That is another proof of the inappropriateness of
the  for the purpose of spacing off tall punctuation marks.



Fonts only specify defaults that alter the rendering produced by a
renderer, but a renderer is not required to use all infos and all
glyphs in a specific font, it has to adapt to the context and choose
what is more relevant and which kind of data it recognizeds and
implements/uses at runtime. The font just provides the best settings
according to the font designer, if all features are enabled, but most
work is done by the renderer (and fonts are completely unaware of
tyhe actual encoding of documents, fonts are only a database
containing multiple features/settings, all of them bneing optional
and selectable individually).


Good point, indeed. Currently we are too much concerned with fonts,
while actually it’s all up to the renderer. Today as most devices
are permanently connected to the internet, a decent rendering engine
could as well grab missing glyphs from an online repository, at
Google Fonts or at the application vendor’s website. All that
missing-glyph-whining seems completely outdated and very detrimental
to the user experience. It is so anachronistic that people shouldn’t
be surprised about suspicions of intentional bugs for the purpose of
unlawful lobbying by messing up user experience outside of certain
DTP a

Re: Encoding italic (was: A last missing link)

2019-01-15 Thread James Kass via Unicode



Although there probably isn't really any concerted effort to "keep 
plain-text mediocre", it can sometimes seem that way.


As we've been told repeatedly, just because something has been done over 
and over again doesn't mean that there's a precedent for it.


Using spans of text as a general indicator of rich-text seems reasonable 
at first blush.  But selected spans can also be copy/pasted (relocated), 
which is not stylistic at all.  Spans of text can be selected to apply 
casing, which is often seen as non-stylistic.  In applications such as 
BabelPad, spans of text can be converted to-and-from various forms of 
Unicode references and encodings.  Spans of text can be transliterated, 
moved, or deleted. In short, selecting a span of text only means that 
the user is going to apply some kind of process to that span.


Avant-garde enthusiasts are on the leading edge by definition. That's 
why they're known as trend setters.  Unicode exists because 
forward-looking people envisioned it and worked to make it happen. 
Regardless of one's perception of exuberance, Unicode turned out to be 
so much more than a fringe benefit.




wws dot org

2019-01-15 Thread Johannes Bergerhausen via Unicode
Dear list,

I am happy to report that www.worldswritingsystems.org 
 is now online.

The web site is a joint venture by

— Institut Designlabor Gutenberg (IDG), Mainz, Germany,
— Atelier National de Recherche Typographique (ANRT), Nancy, France and
— Script Encoding Initiative (SEI), Berkeley, USA.

For every known script, we researched and designed a reference glyph.

You can sort these 292 scripts by Time, Region, Name, Unicode version and 
Status.
Exactly half of them (146) are already encoded in Unicode.

Here you can find more about the project:
www.youtube.com/watch?v=CHh2Ww_bdyQ 

And is a link to see the poster:
https://shop.designinmainz.de/produkt/the-worlds-writing-systems-poster/

All the best,
Johannes




↪ Prof. Bergerhausen
Hochschule Mainz, School of Design, Germany
www.designinmainz.de
www.decodeunicode.org

Re: Encoding italic (was: A last missing link)

2019-01-15 Thread wjgo_10...@btinternet.com via Unicode

Hi

You are the gentleman who kindly made the Gentium typeface open source.

 Thank you for your generous gift to the world.

 > Use of variation selectors, a single character modifier, or 
combining characters also seem to be less useful options, as they act at 
the individual character level and are highly impractical. They also 
violate the key concept that italics are a way of marking a span of text 
as 'special' - not individual letters. Matched punctuation works the 
same way and is a good fit for italic.


Italics works differently from matched punctuation marks in that with 
italics there is a change to each glyph whereas with matched punctuation 
there is no change to the glyphs between the matched punctuation marks.


That difference leads to the significant difficult that there are thus 
two competing forces here.


One of those forces is what you have stated about the nature of italics. 
The other of those forces is that Unicode is not stateful.


Years ago I encoded some Private Use Area codes for such features as 
italics, with a start character and an end character to surround a span 
of text that would then be rendered in italics.  As a result of 
discussion and advice I learned that such characters are not acceptable 
for encoding into regular Unicode because the effect would be stateful. 
So yes, the method that I suggested and for which James Kass suggested 
an enhancement is peculiar when viewed against the theory of the way 
that italics are used, but neither the method nor the enhanced method is 
stateful and that is an important feature of them.


Now it would be possible for a software application program to have a 
feature for composing plain text where a span of text may be highlighted 
by a user of the software application program and every character 
(except perhaps spaces?) within that span of text has, at the click of a 
button, a VS14 character inserted after it.


I remember that when handsetting metal type the same space sorts were 
used with italics as with roman.


There could also be a button that could remove all VS14 characters, if 
any, from within a highlighted span of text.


So, for someone typesetting plain text and viewing plain text the effect 
could look to be in accordance with how you consider italics should be 
encoded, though for plain text interchange the encoding would still be 
by using a VS14 character after each character that one wishes to become 
displayed italicized.


William Overington
Tuesday 15 January 2019



Encoding italic (was: A last missing link)

2019-01-15 Thread Victor Gaultney via Unicode
I've been alerted to this thread by a friend, so just rejoined in order 
to respond. I'm currently doing research into italics.


Some of the confusion and disagreement about italics centers around 
whether it is typographic markup or textual content. Both historically 
and currently italics can be used for either, but can clearly change the 
meaning of a word or phrase*.  It also has a different semantic meaning 
than bold.** It is not just rich text, nor parallel to casing. It works 
differently, and most like the use of matching punctuation (parentheses, 
brackets, quotation marks).


Italics are sometimes used to indicate stress, although that is only one 
use. Stress is like a phonetic sound. It is represented in writing 
systems in different ways. However a writing system text encoding 
standard relates to the visual symbols and the rules of their behaviour 
rather than to the sound itself. Italicised text is visually different, 
and that difference can have a variety of meanings.


It would make sense for Unicode to encode the visual difference that 
marks those meanings (such as stress), just as it does with punctuation. 
Quotation marks, for example, are visually represented in different ways 
depending on the language, but Unicode does have characters that are use 
to indicate that 'this is a quote'. So it makes no sense for Unicode to 
encode 'stress' as a character, but it *may* make theoretical sense to 
encode 'italic begin' and 'italic end' characters, just as we do 
parentheses, brackets, quotation marks, etc. This would allow for the 
use of italic in non-styled environments (text messages, social media, 
etc.).


BTW - encoding the begin/end of italic would be very different from HTML 
semantic tags that attempt to encode meaning. Like punctuation, it only 
encodes the visual distinction, not the meaning.


Use of variation selectors, a single character modifier, or combining 
characters also seem to be less useful options, as they act at the 
individual character level and are highly impractical. They also violate 
the key concept that italics are a way of marking a span of text as 
'special' - not individual letters. Matched punctuation works the same 
way and is a good fit for italic.


Although italic is a deeply Latin script concept, people do want to 
apply it to non-latin text (with sometimes limited sense and success). 
Encoding two punctuation characters would allow use across scripts, in 
the same way that quotation marks are sometimes used.


My current research in italic won't get published publicly until 2020, 
however I gave a talk at ATypI Montreal about the nature of italic 
(https://www.youtube.com/watch?v=4vlFxed22Sg). I have an unpublished 
paper on italic but can't share it publicly (due to image rights). 
Contact me if you would like to see a private copy.


Victor Gaultney

* David Crystal's famous example is that these two sentences mean 
different things: 'I've lost my red slippers' and 'I've lost my /red/ 
slippers' (as opposed to my blue ones). Crystal, David. 1994. The 
Cambridge encyclopedia of language (Cambridge University Press), p13-14.


** Vachek, Josef, and Philip A Luelsdorff. 1989. Written language 
revisited (Amsterdam: Benjamins), p45-48.




Re: A last missing link for interoperable representation

2019-01-15 Thread wjgo_10...@btinternet.com via Unicode

Martin J. Dürst wrote:

So rich text technology is already way ahead when it comes to styled 
text. Do we want to encode background-color variant selectors in 
Unicode? If yes, how many?


Yes.

You would only need one.

Background colour was a feature of teletext in the United Kingdom from 
1976. It was very effective in its application.


In teletext, there were seven choices of foreground colour (red, green, 
yellow, blue, magenta, cyan, white), the default background was black.


The New Background control character caused the background colour to 
become the same as the current foreground colour in which text was being 
displayed. One could then change the foreground colour.


There was also a Black Background control code. This was necessary 
because neither text nor graphics could be black in teletext.


In teletext those control codes were stateful and applied until a change 
or to the end of the line of text, whichever came first.


So, given that Unicode is starting to encode colour choices for emoji 
and black is in the set of colours - and that might possibly extend to 
choosing colour for text - if Unicode were to encode CHANGE BACKGROUND 
COLOUR then the background colour could become the current foreground 
colour, even if that chosen foreground colour had just been selected and 
not actually used to colour text.


The implementation in Unicode need not be stateful.


[Hint: The last two questions are rhetorical.]


Maybe that was the intention, but the questions were asked and the 
concept is an interesting possibility for implementation.


William Overington

Tuesday 15 January 2019




Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Note that even if this NNBSP character is not mapped in a font, it should
be rendered correctly with all modern renderers (the mapping is necessary
only when a font design wants to tune its metrics, because its width varies
between 1/8 and 1/6 em (the narrow space is a bit narrower in traditional
English typography than in French, so typical English design set it at
about 1/8 em, typical French design set it at 1/6 em, and neutral fonts may
set it somewhere in the middle); the measure in em may however vary with
some fonts (notably those using "narrow" or "wide" letters by default
(because the font size in em indicates only its height) and in
decorated/cursive styles (e.g. fonts with swashes need a higher line gap,
the font design of the em size may be smaller than for modern simplified
styles for display).

But a renderer should have no problem using a default metric for all
whitespace characters, that actually don't need any glyph to be drawn:
All what is needed is metrics, everything else, inclusing character
properties like breaking are infered by the renderer independantly of the
font and other per-language tuning, or controled by styling effects applied
on top of the font

A renderer may expand the kerning/approach if needed for example to
generate "hollow" or "shadow" effects, or to generate synthetic weights,
including with "variable" fonts support, typically the renderer will base
the metrics of all missing/unmapped whitespaces from the metrics given to
the normal SPACE or NBSP which are typically both mapped to the same glyph;
NNBSP will be synthetized easily using half the advance width of SPACE, and
it's fine; renderers can also synthetize all other whitespaces for
ideographic usages, or will adapt the rendering if instructed to synthetize
a monospaced variant: here there's a choice for NNBSP to be rendered like
NBSP, typically for French as it is normally a bit wider, or as a
zero-width space like in English, or contextually for example zero-width
near punctuations or NBSP between letters/digits).

Fonts only specify defaults that alter the rendering produced by a
renderer, but a renderer is not required to use all infos and all glyphs in
a specific font, it has to adapt to the context and choose what is more
relevant and which kind of data it recognizeds and implements/uses at
runtime. The font just provides the best settings according to the font
designer, if all features are enabled, but most work is done by the
renderer (and fonts are completely unaware of tyhe actual encoding of
documents, fonts are only a database containing multiple features/settings,
all of them bneing optional and selectable individually).

If your fonts behave incorrectly on your system because it does not map any
glyph for NNBSP, don't blame the font or Unicode about this problem, blame
the renderer (or the application or OS using it, may be they are very
outdated and were not aware of these features, theyt are probably based on
old versions of Unicode when NNBSP was still not present even if it was
requested since very long at least for French and even English, before even
Unicode, and long before Mongolian was then encoded, only in Unicode and
not in any known supported legacy charset: Mongolian was specified by
borrowing the same NNBSP already designed for Latin, because the Mongolian
space had no known specific behavior: the encoded whitespaces in Unicode
are compeltely script-neutral, they are generic, and are even BiDi-neutral,
they are all usable with any script).


Re: A last missing link for interoperable representation

2019-01-15 Thread Marcel Schneider via Unicode

On 15/01/2019 10:24, Philippe Verdy via Unicode wrote:


Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode mailto:unicode@unicode.org>> a écrit :

On 14/01/2019 06:08, James Kass via Unicode wrote:
>
> Marcel Schneider wrote,
>
>> There is a crazy typeface out there, misleadingly called 'Courier
>> New', as if the foundry didn’t anticipate that at some point it
>> would be better called "Courier Obsolete". ...
>
> 𝐴𝑟𝑡 𝑛𝑜𝑢𝑣𝑒𝑎𝑢 seems a bit 𝑝𝑎𝑠𝑠é nowadays, as well.
>
> (Had to use mark-up for that “span” of a single letter in order to
> indicate the proper letter form.  But the plain-text display looks
> crazy with that HTML jive in it.)
>

I apologize for seeming to question the font name 𝑝𝑒𝑟 𝑠𝑒 while targeting 
only
the fact that this typeface is not updated to support the . It just
looks like the grand name is now misused to make people believe that if
**this** great font is unsupporting , it has a good reason to do so,
and we should keep people off using that “exotic whitespace” otherwise than
“intended,” ie for Mongolian. Since fortunately TUS started backing its use
in French (2014)


This is not for Mongolian and French wanted this space since long and it has a 
use even in English since centuries for fine typography.
So no, NNBSP is definitely NOT "exotic whitespace". It's just that it was 
forgotten in the early stages of computing with legacy 8-bit encodings but it should have 
been in Unicode since the begining as its existence is proven long before the computing 
age (before ASCII, or even before Baudot and telegraphic systems). It has alsway been 
used by typographs, it has centuries of tradition in publishing. And it has always been 
recommended and still today for French for all books/papers publishers.

Many thanks for bringing this to the point. So the case is even worse as Unicode 
deliberately skipped the non-breakable thin space while thinking at encoding the whole 
range of other typographic spaces, even with duplicate encoding of en and em spaces, and 
not forgetting those old-fashioned tabular spaces and dash: figure space and dash, and 
punctuation space. In this particular context and with all that historic practice 
background, what else than malice (supposedly inspired by an unlawful and exuberant DTP 
vendor) could drive people not to define the line-breaking property value of U+2008 
PUNCTUATION SPACE as "GL", while they did define it so for U+2007 FIGURE SPACE.

Here is also the still outdated wording of UAX #14 wrt NNBSP, Mongolian and 
French:

   […] NARROW NO-BREAK SPACE is used in Mongolian. The 
MONGOLIAN VOWEL SEPARATOR acts like a NARROW NO-BREAK SPACE in its line breaking 
behavior. It additionally affects the shaping of certain vowel characters as 
described in/Section 13.5, Mongolian/, of [Unicode 
].

   NARROW NO-BREAK SPACE is a narrow version of 
NO-BREAK SPACE, which has exactly the same line breaking behavior, but with a 
narrow display width. It is regularly used in Mongolian in certain grammatical 
contexts (before a particle), where it also influences the shaping of the 
glyphs for the particle. In Mongolian text, the NARROW NO-BREAK SPACE is 
typically displayed with one third the width of a normal space character.

   When NARROW NO-BREAK SPACE occurs in French text, it 
should be interpreted as an “espace fine insécable”.


“When […] it should be interpreted as […]” is a pure insult. NARROW NO-BREAK SPACE *is* 
exactly at least the French "espace fine insécable" *and* the Mongolian 
whatever-it-is-called-in-Mongolian *and* the group separator, aka triad separator, in 
*all* locales following the SI and ISO recommendation to group digits with spaces, not 
with any punctuation.

As hopefully that misleading section will be edited, here’s the link to the 
quoted version:
https://www.unicode.org/reports/tr14/tr14-41.html#DescriptionOfProperties


Also I’d like or better I need to kindly ask the knowing List Members to 
correct the following statement *if* it is wrong:

   If the Unicode Standard had been set up in an unbiased way, U+2008 
PUNCTUATION SPACE had been given the line break property value "GL".

Perhaps the following would also be true:

   If the Unicode Standard had been set up in an unbiased way, 
there would be a NARROW NO-BREAK SPACE encoded in the range U+2000..U+200F.


Thanks in advance to Philippe Verdy and any other knowing List Members for 
staying or getting in touch and (keeping) posting feedback.

I don’t edit the subject line, nor do I spin off a new thread, given when I 
lauched this one I sincerely believed that the issues with NARROW NO-BREAK 
SPACE and with preformatted superscript abbreviation indicators for 
interoperable representation of French and numerous other languages (part of 

Re: A last missing link for interoperable representation

2019-01-15 Thread Julian Bradfield via Unicode
On 2019-01-15, Philippe Verdy via Unicode  wrote:
> This is not for Mongolian and French wanted this space since long and it
> has a use even in English since centuries for fine typography.
> So no, NNBSP is definitely NOT "exotic whitespace". It's just that it was
> forgotten in the early stages of computing with legacy 8-bit encodings but
> it should have been in Unicode since the begining as its existence is
> proven long before the computing age (before ASCII, or even before Baudot
> and telegraphic systems). It has alsway been used by typographs, it has
> centuries of tradition in publishing. And it has always been recommended
> and still today for French for all books/papers publishers.

Do you expect people to encode all the variable justification spaces
between words by combining all the (numerous) spaces already available
in Unicode?
And how about the kerning between letters? If spacing of punctuation
is to be encoded instead of left to display algorithms, shouldn't you
also encode the kerns instead of leaving them to the font display
technology?

Oh, and what about dropped initials? They have been used in both
manuscripts and typography for many centuries - surely we must encode
them?

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-15 Thread Hans Åberg via Unicode


> On 15 Jan 2019, at 02:18, Richard Wordingham via Unicode 
>  wrote:
> 
> On Mon, 14 Jan 2019 16:02:05 -0800
> Asmus Freytag via Unicode  wrote:
> 
>> On 1/14/2019 3:37 PM, Richard Wordingham via Unicode wrote:
>> On Tue, 15 Jan 2019 00:02:49 +0100
>> Hans Åberg via Unicode  wrote:
>> 
>> On 14 Jan 2019, at 23:43, James Kass via Unicode
>>  wrote:
>> 
>> Hans Åberg wrote,
>> 
>> How about using U+0301 COMBINING ACUTE ACCENT: 𝑝𝑎𝑠𝑠𝑒́  
>> 
>> Thought about using a combining accent.  Figured it would just
>> display with a dotted circle but neglected to try it out first.  It
>> actually renders perfectly here.  /That's/ good to know.  (smile)  
>> 
>> It is a bit off here. One can try math, too: the derivative of 𝛾(𝑡)
>> is 𝛾̇(𝑡).
>> 
>> No it isn't.  You should be using a spacing character for
>> differentiation. 
>> 
>> Sorry, but there may be different conventions. The dot / double-dot
>> above is definitely common usage in physics.

Also in differential geometry, as for curves.

>> A./
> 
> Apologies.  It was positioned in the parenthesis, and it looked like a
> misplaced U+0301.

In MacOS, one can drop the combined character into the character table, and see 
that it is U+0307 COMBINING DOT ABOVE.

It comes out right when typeset in ConTeXt.





Re: A last missing link for interoperable representation

2019-01-15 Thread Philippe Verdy via Unicode
Le lun. 14 janv. 2019 à 20:25, Marcel Schneider via Unicode <
unicode@unicode.org> a écrit :

> On 14/01/2019 06:08, James Kass via Unicode wrote:
> >
> > Marcel Schneider wrote,
> >
> >> There is a crazy typeface out there, misleadingly called 'Courier
> >> New', as if the foundry didn’t anticipate that at some point it
> >> would be better called "Courier Obsolete". ...
> >
> > 𝐴𝑟𝑡 𝑛𝑜𝑢𝑣𝑒𝑎𝑢 seems a bit 𝑝𝑎𝑠𝑠é nowadays, as well.
> >
> > (Had to use mark-up for that “span” of a single letter in order to
> > indicate the proper letter form.  But the plain-text display looks
> > crazy with that HTML jive in it.)
> >
>
> I apologize for seeming to question the font name 𝑝𝑒𝑟 𝑠𝑒 while
> targeting only
> the fact that this typeface is not updated to support the . It just
> looks like the grand name is now misused to make people believe that if
> **this** great font is unsupporting , it has a good reason to do so,
> and we should keep people off using that “exotic whitespace” otherwise than
> “intended,” ie for Mongolian. Since fortunately TUS started backing its use
> in French (2014)
>

This is not for Mongolian and French wanted this space since long and it
has a use even in English since centuries for fine typography.
So no, NNBSP is definitely NOT "exotic whitespace". It's just that it was
forgotten in the early stages of computing with legacy 8-bit encodings but
it should have been in Unicode since the begining as its existence is
proven long before the computing age (before ASCII, or even before Baudot
and telegraphic systems). It has alsway been used by typographs, it has
centuries of tradition in publishing. And it has always been recommended
and still today for French for all books/papers publishers.


Re: A last missing link for interoperable representation

2019-01-15 Thread Marcel Schneider via Unicode

On 15/01/2019 03:02, Asmus Freytag via Unicode wrote:

On 1/14/2019 5:41 PM, Mark E. Shoulson via Unicode wrote:

On 1/14/19 5:08 AM, Tex via Unicode wrote:


This thread has gone on for a bit and I question if there is any more light 
that can be shed.

BTW, I admit to liking Asmus definition for functions that span text being a 
definition or criteria for rich text.



Me too.  There are probably some exceptions or weird corner-cases, but it seems 
to be a really good encapsulation of the distinction which I had never seen 
before.


** blush **

A./



I did like it too, and I was really amazed that the issue could be boiled down to such a handy shibboleth. It wasn’t until I’m looking harder that I can’t help any more seeing it as a mere 
rewording of current practice. That is, if we’re using markup (that typically acts on spans and other elements), it’s rich text; if we’re using characters, it’s plain text. The reason why I 
changed my mind is that the new shibboleth can be misused to relegate to the realm of rich text some feature of a writing system, like using superscript as ordinal indicators (English 
"3ʳᵈ", French "2ᵉ" [order] or "2ⁿᵈ" [rank], Italian "1ᵃ" or — in Latin-1 — "1ª", the latter being used in German as a narrow form of 
"prima" that has special semantics there ["top quality" or "great!"]), only on the basis that it is currently emulated using rich text by declaring that 
"ᵉ" is—or “should” be—a span with superscript markup, so that we end up with "2e".

As I’ve (too) slightly pointed in a previous reply, that is not what we should 
end up with. Abbreviation indicators in Latin script are a case of a single 
character solution, albeit multiple characters may be involved in a single 
instance. We can also have inner uppercase, aka camelcase, that cannot be 
handled by the titlecase attribute. We’re clearly in the realm of plain text, 
and any other solution may be called an emulation, or a legacy workaround, but 
not a Unicode conformant interoperable representation.

Also, please note the presence in Unicode, of U+070F SYRIAC ABBREVIATION MARK, 
a format control… Probably there are also some other format controls in other 
scripts, performing likely the same job. Remember when a similar solution was 
suggested for Latin script on this List…

Best regards,

Marcel