RE: informative due to variation across langauges

2001-06-19 Thread James E. Agenbroad

On Tue, 19 Jun 2001, Marco Cimarosti wrote:

> Peter Constable wrote:
> > Can anyone think of other examples of informative properties 
> > that are so
> > because the property is typical but not true for all languages?
> 
>[snip]
I arrived late to this discussion.  Is "culturally correct" sorting/filing
such a property?  I believe the Japanese and Koreans sort/file Kanji/Hani
phonetically--as if they were written in kana and hangul. And that
software cannot be expected to derive the kana from the kanji. I think it
is also the case that "good" sorting of Latin, Cyrillic, Arabic scripts
is language dependent (and m aybe other scripts too.
 Regards,
  Jim Agenbroad ( [EMAIL PROTECTED] )
 The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library
of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.  





RE: informative due to variation across langauges

2001-06-19 Thread Marco Cimarosti

Peter Constable wrote:
> Can anyone think of other examples of informative properties 
> that are so
> because the property is typical but not true for all languages?

Is it stretching things too much to say that glyphs (the representative
glyphs as published in TUS) are informative character properties?

If not, the fact that different languages may use different glyphs can be
seen as one reason why there cannot be "normative glyphs". Of course it is
just one of the zillions reasons, and probably not even the most important
one.

However, the fact that glyphs may depend on language is particularly
important in some contexts, namely CJK characters. E.g., I think that all
the stroke count information in UniHan.txt is informative (is it, right?)
because the counting depends on the actual glyphs, and the glyphs partly
depends on which language is considered.

> Can anyone give me a specific example of why Line Breaking or 
> East Asian Width properties aren't normative?

East Asian Width could be seen as another example of a property which is
informative because it depends on actual glyphs, which in turn depend on the
actual language. E.g., the whole East Asian Width property is meaningful
only for systems which implement East Asian typography.

_ Marco




Re: informative due to variation across langauges

2001-06-15 Thread Peter_Constable


>Well, not exactly. "It's normative" *means* that xyz. But "It's normative"
>*because* the Unicode Standard says so, which in turn is because the
>UTC voted that it be so.
>
>*Why* they voted so may be an interesting historical question in
>particular instances, but it may be beyond the necessities of
>didactic explanation. A little bit like asking why cardinals are
>red and bluebirds are blue, when you get down to it. Maybe there
>actually *is* a real reason (or reasons) for that, but it is probably too
>complicated to figure out, and ultimately besides the point for
>people who just need to be able to distinguish cardinals from bluebirds.

Precisely. If I'm teaching someone about Unicode, I need to give them some
hook on which to hang the normative vs. informative distinction, and the
most accurate answers, viz. because UTC decided so, is a bit too abstract.
Something like

- "it doesn't work the same way in all cases" or
- "it's just additional documentation that has no implications for how
processes should behave" or
- "the issues aren't yet well enough understood to set it all in stone" or
- "there are a bunch of compatibility characters for which the values are
unclear or controversial"

works, though. But I guess the first of these exaplanations can't easily be
used anymore, now that case mappings are normative.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>






Re: informative due to variation across langauges

2001-06-15 Thread Peter_Constable


>> But normative explicitly does *not* mean unchangeable.
>
>It quite specifically means that others can use it and reference it.
Anyone
>knows you cannot build a house on a shifting foundation, which is why
making
>something "normative" should be something reserved for things that one is
>*not* going to change.

Sorry, but check out the text on p73, TUS 3.0:


The term normative when applied to a character property does *not* mean
that the value of the property will never change. Corrections and
extensions to the standard in the future may require minor changes to
normative values, even though the Unicode Technical Committee strives to
minimize such changes.


It is true that *some* normative properties (and some informative
properties, e.g. Unicode 1.0 Name) are unchangeable, but it is not true
that *all* are. Case in point, the combining classes underwent a lot of
changes from TUS 2.1.9 to 3.0, and consideration is being given to further
changes (though decidedly less drastic).


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>






Re: informative due to variation across langauges

2001-06-15 Thread Kenneth Whistler

Peter continued:

> Indeed: e.g. that is true for the Unicode 1.0 Name property. My question,
> though, is whether there are some properties that are informative because
> they may be typical for most languages but not true for all. It was always
> my impression that that was the reason for case mappings having been
> informative. Was I wrong in that assumption?

No, you are probably right. Everybody knew from the start (of Unicode)
that case mappings were going to have exceptions. In the absence of
a Character Property Model to guide thinking about how to pigeonhole
things, that turned into an implicit assumption that case-mapping as
a whole should be informative, since we knew that there were going to
be locale-based exceptions for a few well-defined cases.

Recently, it became clearer that the case properties themselves and
the default case mappings had all kinds of firm implications for
many processes that people were implementing, and that it was
inadvisable to keep on saying that case mapping in toto was
informative. This led to the switchover to say that all of this
was normative, together with the formal SpecialCasing.txt way of
enumerating the exceptions.

Offhand, I can't think of any other instances of properties that
were explicitly labelled "informative" because of language-specific
behavior. These are *character* properties after all, and the
characters themselves are not language-specific. I suppose it would
be possible to manufacture another instance comparable to case mapping,
but it would probably be rather odd and presumably wouldn't apply to any
existing property lists in the Unicode Character Database.

> 
> The real issue is that I'm trying to find ways to explain to someone why
> there are distinctions between normative and informative behaviours and
> properties. 

That's easy. Just as for any Dad responding to the kid's "Why is, Daddy?"
questions, you ultimately end up giving the ultimate answer: "Because
that's just the way it is." ;-)

> 
> Which isn't really helpful for my purposes here, which are didactic: "It's
> normative because conformant implementations have to follow it, and they
> have to follow it because it's normative."

Well, not exactly. "It's normative" *means* that xyz. But "It's normative"
*because* the Unicode Standard says so, which in turn is because the
UTC voted that it be so.

*Why* they voted so may be an interesting historical question in
particular instances, but it may be beyond the necessities of
didactic explanation. A little bit like asking why cardinals are
red and bluebirds are blue, when you get down to it. Maybe there
actually *is* a real reason (or reasons) for that, but it is probably too
complicated to figure out, and ultimately besides the point for
people who just need to be able to distinguish cardinals from bluebirds.

--Ken

> 
> 
> >Because no one is yet convinced that the specifics of either are
> >so widely agreed upon that the UTC would want to make
> >some strong claim about conformance to the particular properties
> >and their values for implementations of the behavior.
> 
> Now that works.




Re: informative due to variation across langauges

2001-06-15 Thread Michael \(michka\) Kaplan

From: <[EMAIL PROTECTED]>
> On 06/15/2001 06:29:51 PM "Michael \(michka\) Kaplan" wrote:

> >Why be more specific then there are a lot of people who think they might
> >possibly have made TOO MUCH normative and do not want to make things
> >unchangeable that might be in error or might need to change later?
>
> But normative explicitly does *not* mean unchangeable.

It quite specifically means that others can use it and reference it. Anyone
knows you cannot build a house on a shifting foundation, which is why making
something "normative" should be something reserved for things that one is
*not* going to change.

michka





Re: informative due to variation across langauges

2001-06-15 Thread Peter_Constable


On 06/15/2001 06:28:34 PM Kenneth Whistler wrote:

>Peter asked:
>
>> It used to be that one could describe informative properties saying,
"some
>> properties are valid for most languages but not all and so are
informative,
>> such as case mappings".
>
>This never really was the case, since from the moment that the UTC started
>posting informative properties, there were some that had nothing to do
>with language differences.

Indeed: e.g. that is true for the Unicode 1.0 Name property. My question,
though, is whether there are some properties that are informative because
they may be typical for most languages but not true for all. It was always
my impression that that was the reason for case mappings having been
informative. Was I wrong in that assumption?

The real issue is that I'm trying to find ways to explain to someone why
there are distinctions between normative and informative behaviours and
properties. The Unicode 1.0 Name typifies one reason for having an
informative property (which I take to be that it is historical
documentation that is relevant for implementations based on TUS 1.0 but
that otherwise has no bearing on implementations). I'm trying to motivate
the reason for other informative properties.


>Chapter 4 *does* define normative and informative properties, but
>does so in terms of what a claim of conformance to the property
>means.
>
>I think this is basically correct: normativity has to do with what
>a claim of conformance means, rather than what kind of real-world
>property we are dealing with. This is part of the reason why
>a formerly informative property can change its status to become
>normative.

Which isn't really helpful for my purposes here, which are didactic: "It's
normative because conformant implementations have to follow it, and they
have to follow it because it's normative."


>Because no one is yet convinced that the specifics of either are
>so widely agreed upon that the UTC would want to make
>some strong claim about conformance to the particular properties
>and their values for implementations of the behavior.

Now that works.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>






Re: informative due to variation across langauges

2001-06-15 Thread Peter_Constable


On 06/15/2001 06:29:51 PM "Michael \(michka\) Kaplan" wrote:

>> Can anyone give me a specific example of why Line Breaking or East Asian
>> Width properties aren't normative?
>
>Why be more specific then there are a lot of people who think they might
>possibly have made TOO MUCH normative and do not want to make things
>unchangeable that might be in error or might need to change later?

But normative explicitly does *not* mean unchangeable.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>






Re: informative due to variation across langauges

2001-06-15 Thread Kenneth Whistler

Peter asked:

> It used to be that one could describe informative properties saying, "some
> properties are valid for most languages but not all and so are informative,
> such as case mappings".

This never really was the case, since from the moment that the UTC started
posting informative properties, there were some that had nothing to do
with language differences. 

> Case mappings gave an easy example for why to have
> informative properties. Now that the mappings are informative (with
> normative exceptions listed in SpecialCasing.txt), 

vice-versa, actually

> it's harder to give an
> easy explanation for why some properties are informative.

This comes down to the lack of what I call a "Character Properties Model"
for Unicode.

Asmus Freytag has been working on one side of this problem in an
as yet not public draft for UTR #23 "Survey of Unicode Character
Properties and Guidelines" that the UTC has been kicking around.

Chapter 4 *does* define normative and informative properties, but
does so in terms of what a claim of conformance to the property
means.

I think this is basically correct: normativity has to do with what
a claim of conformance means, rather than what kind of real-world
property we are dealing with. This is part of the reason why
a formerly informative property can change its status to become
normative.

> 
> Can anyone think of other examples of informative properties that are so
> because the property is typical but not true for all languages?
> 
> Can anyone give me a specific example of why Line Breaking or East Asian
> Width properties aren't normative?

Because no one is yet convinced that the specifics of either are
so widely agreed upon that the UTC would want to make
some strong claim about conformance to the particular properties
and their values for implementations of the behavior.

Put it another way, if someone claims that they are doing "Unicode
line breaking", are we yet ready to examine their line breaks
and declare them non-conformant if they make some different
choices than the informative values specified in LineBreak.txt?

On the other hand, if an API purports to be returning the
"Unicode General Property" of a character, and it returns
"Ps" instead of "Lo" for an ideograph at some version of Unicode,
I think we could now agree that that was a non-conformant API, even
though formerly both "Ps" and "Lo" were considered "informative"
values of the General Category.

--Ken

> 







Re: informative due to variation across langauges

2001-06-15 Thread Michael \(michka\) Kaplan

From: <[EMAIL PROTECTED]>

> Can anyone give me a specific example of why Line Breaking or East Asian
> Width properties aren't normative?

Why be more specific then there are a lot of people who think they might
possibly have made TOO MUCH normative and do not want to make things
unchangeable that might be in error or might need to change later? Seems
like a nice, conservative course to me to not lock down stuff that you might
want to change.

michka