Re: What are the present criteria for the encoding of characters that have been fairly recently invented please?

2011-08-18 Thread Karl Pentzlin
Am Mittwoch, 17. August 2011 um 23:00 schrieb Doug Ewell:

>> http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4085.pdf
>> quote
>>   Indicators for such benefit for the user can be:
>>   – Evidence of actual use.
>>   – Evidence of prevention of an otherwise probable actual use due to the 
>> lack of encoding.
>>   – Conformance with or compliance to another standard.
>> end quote

DE> The document cited is a WG2 contribution from the German NB (Karl
DE> Pentzlin?), not a policy statement by UTC or WG2.

The quoted indicators for benefit were part of a concern of the German
NB regarding the Wingding/Webding proposals. The concern expressed in
WG2 N4085 is that some characters proposed there conform neither to the
policy statements by UTC or WG2, nor to the indicators of benefit
which the German NB would accept as an additional reason to encode
Wingding/Webding characters beyond the formal policies of UTC and WG2.

- Karl Pentzlin








Re: What are the present criteria...

2011-08-18 Thread Doug Ewell
Karl Pentzlin  wrote:

> The quoted indicators for benefit were part of a concern of the German
> NB regarding the Wingding/Webding proposals. The concern expressed in
> WG2 N4085 is that some characters proposed there conform neither to
> the policy statements by UTC or WG2, nor to the indicators of benefit
> which the German NB would accept as an additional reason to encode
> Wingding/Webding characters beyond the formal policies of UTC and WG2.

Nevertheless, N4085 is a German NB document, the criteria in question
are those suggested by the German NB and not WG2 (and the document makes
note of this distinction), and it is an error to portray this passage as
representing either a change or a lack of clarity in UTC or WG2 policy.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­






Re: What are the present criteria...

2011-08-18 Thread Asmus Freytag

On 8/18/2011 7:29 AM, Doug Ewell wrote:

Karl Pentzlin  wrote:


The quoted indicators for benefit were part of a concern of the German
NB regarding the Wingding/Webding proposals. The concern expressed in
WG2 N4085 is that some characters proposed there conform neither to
the policy statements by UTC or WG2, nor to the indicators of benefit
which the German NB would accept as an additional reason to encode
Wingding/Webding characters beyond the formal policies of UTC and WG2.

Nevertheless, N4085 is a German NB document, the criteria in question
are those suggested by the German NB and not WG2 (and the document makes
note of this distinction), and it is an error to portray this passage as
representing either a change or a lack of clarity in UTC or WG2 policy.


Karl makes no such claim. The document states that 2093-2096 appear to 
be in violation of the character glyph model. I believe that's the 
section (or one of the sections) in the document that Karl summarizes 
here as "policy statements by UTC or WG2" - at least it would fit.


Anyway, it's more useful to focus on the actual concerns, not about 
whether Karl summarized them correctly in his email.


The German NB introduces the concept of "indicator" of "benefit [to] the 
user", and then defines that as:

- evidence of actual use
- evidence that it's likely a wrong character might be used for lack of 
an encoded character

- conformance to other standards
(I've slightly rephrased for clarity).

I have several problems with this approach.

First, these "indicators" are rather haphazardly compiled. Overwhelming 
evidence of plain text use, and conformance requirements are already 
recognized as valid reasons to encode characters (not just symbols). 
They do not, however, help in evaluating those proposals where more 
nuanced judgement is required. The third element, that the wrong 
character might be mistakenly used, is of overriding concern only in 
particular cases where questions of unification or disambiguation need 
to be decided.


Second, it's really unsatisfactory if each NB has their own criteria for 
when to add characters to the standard, and it's especially unsettling 
when such criteria seem to be "ad-hoc" applied to a given repertoire. 
WG2 and Unicode have had lengthy discussions and broad consensus about 
the kinds of criteria to take into account when encoding characters in 
general or symbols in particular.


The result has been captured in a number of documents, for example, 
here's the original one from the UTC: 
http://unicode.org/pending/symbol-guidelines.html. (with links to more 
recent versions).


Unlike the list in N4085, the criteria adopted by UTC and WG2 are not 
formulated as PASS / FAIL. Instead, they were carefully designed to be 
used in assigning weight in favor or in disfavor of encoding a 
particular symbol as a character. This recognizes an important 
principle, which has been notably absent in much recent discussion: it 
is generally not possible to create any set of criteria that can be 
applied mechanistically (or algorithmically). The decision to encode a 
character is and remains a judgement call. Some calls are easy, because 
the evidence is overwhelming and direct, some calls are more difficult, 
because the evidence may be uncertain or indirect, or the nature of the 
proposed character may not be as well understood as one would ideally 
prefer.


Recognizing these inherent difficulties in the encoding work and the 
need for a set of weighing factors instead of simplistic PASS / FAIL 
criteria was one the early break-throughs in the work of WG2 and UTC. 
Accordingly the documents speak not of criteria "whether" to encode 
characters, but criteria that "strengthen (resp. weaken) the case for 
encoding". That's a crucial difference.


While the details of these criteria (or factors) can and should be 
evaluated from time to time for continued appropriateness, the soundness 
of the general methodology is not in question, and UTC and WG2 should 
resist any attempts (directly or indirectly) to abandon them in favor of 
an unworkable, simplistic, and ad-hoc PASS / FAIL approach.


What are relevant criteria?

The document I cited lists the original set of criteria as follows


 What criteria strengthen the case for encoding?

   The symbol:

 * is typically used as part of computer applications (e.g. CAD
   symbols)
 * has well defined user community / usage
 * always occurs together with text or numbers (unit, currency,
   estimated)
 * must be searchable or indexable
 * is customarily used in tabular lists as shorthand for
   characteristics (e.g. check mark, maru etc.)
 * is part of a notational system
 * has well-defined semantics
 * has semantics that lend themselves to computer processing
 * completes a class of symbols already in the standard
 * is letterlike (i.e. should vary with the surrounding font style)


 What criteria weaken the

Re: What are the present criteria...

2011-08-18 Thread Karl Pentzlin
Am Donnerstag, 18. August 2011 um 19:24 schrieb Asmus Freytag:


AF> ... The document [WG2 N4085] states that [Wingdings] 2093-2096
AF> appear to be in violation of the character glyph model. I
AF> believe that's the section (or one of the sections) in the
AF> document that Karl summarizes here as "policy statements by
AF> UTC or WG2"

Yes.

AF> The German NB introduces the concept of "indicator" of "benefit [to] the 
user",

That was intended as very short summary to illustrate a concern on a
specific issue in a comment document, not more ...

AF>  Second, it's really unsatisfactory if each NB has their own
AF> criteria for when to add characters to the standard

... and especially the intent was not to layout any general criteria
for the German NB or anybody else.
Thus, please do not assign too much weight to a subordinate clause
in a document which in fact aimed at other issues.

AF> What criteria strengthen the case for encoding? ...

Thank you for clarifying this in this discussion.

AF> ... The symbol:
AF> - is typically used as part of computer applications (e.g. CAD symbols)
AF> - has well defined user community / usage
AF> - always occurs together with text or numbers (unit, currency, estimated)
AF> - must be searchable or indexable
AF> - is customarily used in tabular lists as shorthand for characteristics 
(e.g. check mark, maru etc.)
AF> - is part of a notational system
AF> - has well-defined semantics
AF> - has semantics that lend themselves to computer processing
AF> - completes a class of symbols already in the standard
AF> - is letterlike (i.e. should vary with the surrounding font style)

Do you agree that a considerable part of the Wingdings/Webdings symbol set
does not comply with even a single one of these criteria?

This, in short, was the concern expressed in N4085.

Then, we expressed that even *if* a larger set of "indicators" is applied
(which were presented exemplarily in the short list cited in this discussion),
a considerable part of these symbols fails.
This, and only this, and only in this context, was the purpose of the
"indicator" list.

AF> ... If one agrees with the premise of encoding the Web/Wingding sets
AF> "compatibility sets" ...

(Which we in fact did not agree to when compiling the comments in WG2
 N4085, but after we learned that at least those competitors of Microsoft
 who are engaged in the UTC do not oppose to this view, we also oppose no
 longer.)

- Karl