On Mon, Jul 23, 2012 at 12:17 AM, Mark Rejhon <marky...@gmail.com> wrote:
> 19. Edit deferred -- Explanation given in previous email. It helps >>> reader associate WHICH definition of "character" we are using. Even the >>> RFC's say that the word has multiple interpretations, so it's appropriate >>> here in the title. The title is like a glossary entry, and the contents >>> explain we're using code points as the method of counting characters. >>> >> I still regard this dangerous and confusing. We are counting Unicode >> code points, and that needs to be clear in all explanations. >> > > We will have to agree to disagree -- I think it's safer and less confusing: > Did you know there are 47 occurances of the word "character" in the whole > document? > > Therefore, I prefer not to remove the word "Character" in the heading > "Unicode Character Counting". Thus, it is like the heading of an extended > *glossary* definition here -- and it is in my opinion safer and less > confusing. Obviously, the section is too big to move to the glossary > section, but I am open to alternate ideas of defining the word "character" > from this mailing list. > For this, I defer to public comment (once 0.5 is up). > Referring to: http://unicode.org/glossary/ , which says the following: *Code Point <http://unicode.org/glossary/#code_point>*. (1) Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding<http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf#G2212>.) Not all code points are assigned to encoded characters. See *code point type<http://unicode.org/glossary/#code_point_type> *. (2) *A value, or position, for a character, in any coded character set.* Other rationale: - Other XEP's use "character" terminology - People are already familiar with "character" terminology. - There's 47 occurances of word "character" in XEP-0301 .... (e.g. "...Remove 1 character from...") - Search-Replace all of them into "code points" would make document _even_ more confusing to those who are not familiar with "code point" terminology. - Therefore, I feel that the lesser of evil is to treat "Unicode Character Counting" as a definition of XEP-0301's use of the word "character". If an implementer makes an error in interpreting the word "character" this this section clarifies it. - If several people here agree with Gunnar that "Unicode Character Counting" should be renamed to "Unicode Code Point Counting", they would probably also agree that the word still needs to be defined somewhere else -- such as in the Glossary section. (defining the word "character" from the perspective of XEP-0301, since "character" has multiple interpretations, so it is necessary to define the word "character", and I chose "Unicode Character Counting" as the definition of "character") .... I am open to alternative methods of defining "character", but it needs to be less confusing, not even more confusing. I'd like to hear opinions from others about this matter, as well as general comments about "Accurate Processing of Action Elements" (of which "Unicode Character Counting" is included within). http://xmpp.org/extensions/xep-0301.html#accurate_processing_of_action_elements Thanks, Mark Rejhon