UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
In Russian, the difference between Е and Ё is primary at the beginning
of a word as they are considered distinct letters of the alphabet, yet
secondary in the middle of a word, as the dieresis over Ё is not
mandatory. As an example, ель  ёлка, but тёлка  тель, see
http://ru.wikisource.org/wiki/Орфографический_словарь_русского_языка

A cursory scan of the UCA doesn't reveal if that's implementable, and
experiments in a fairly fresh Linux Mint yield either
ель  ёлка  тель  тёлка or ель  тель  тёлка  ёлка depending on
the LANG setting (en_US works better than ru_RU).

Could someone tell if the UCA in its current form is able to support that?

Thanks,
Leo




Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Leo Broukhis, Fri, 21 Dec 2012 01:31:18 -0800:
 In Russian, the difference between Е and Ё is primary at the beginning
 of a word as they are considered distinct letters of the alphabet, yet
 secondary in the middle of a word, as the dieresis over Ё is not
 mandatory.

 As an example, ель  ёлка, but тёлка  тель, see
 http://ru.wikisource.org/wiki/Орфографический_словарь_русского_языка

You say that the difference is primary in the beginning of a word but 
elsewhere secondary. And yes, that orthographic dictionary that you 
link to above, looks as you describe.

However, in reality, the difference is secondary - if that is the right 
word - even as the first letter in a word. Wikipedia has the following 
example: едок  ёж  ездит.[1] And, for instance the word ёлка could 
also be written елка.

Hence I would argue that the dictionary you linked to above considers 
the difference to *always* be secondary. It is just that the dictionary 
applies the sorting algorithm to a collection where the words that 
begins with the letter Ё has been separated from words that begins on 
the letter Е.

 A cursory scan of the UCA doesn't reveal if that's implementable, and
 experiments in a fairly fresh Linux Mint yield either
 ель  ёлка  тель  тёлка or ель  тель  тёлка  ёлка depending on
 the LANG setting (en_US works better than ru_RU).

(Both examples consider the difference primary, but the the last 
example is incorrect as the ёлка follows after the тёлка - which is 
incorrect from every angle (except from the angle of the number of the 
letter inside Unicode.)

 Could someone tell if the UCA in its current form is able to support that?

Is there not a need for 3 kinds of sorting? Namely: a) Е/Ё as always 
distinct letters, b) Е/Ё as always non-distinct letters, c) Е/Ё as 
non-distinct letters except when used as the first letter. (Note that 
the last variant would only be yield correct result on collections of 
words where a first-letter Ё is guaranteed be rendered with a Ё. Thus, 
if ёлка is written елка, then the result becomes incorrect.)

Linguistic PS: From the angle of the color of the sound, then Russian 
Ё is the light version of Russian О. (Its predecessor was also a 
digraph - IO.) But from the angle of stress then, when the Ё looses 
its stress, it alternates with Russian Е (since Е can both be with and 
without stress, whereas Ё can only be with stress). The reason why Е/Ё 
is often considered a secondary difference, is (I think) related to the 
stress: But for in lexicons and dictionaries, then Russian texts 
typically do not mark where the stress of a word is. The stress is 
simply known by the reader/user.

[1] http://en.wikipedia.org/wiki/Ё#Russian
-- 
leif halvard silli




Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
[Philippe tells me that his message that I'm quoting could have been
rejected by the mailing list as spam; my answer is below.]

On Fri, Dec 21, 2012 at 5:13 AM, Philippe Verdy verd...@wanadoo.fr wrote:
 This is an interesting case. A solution would be to be able define a
 distinct collation element for ^ë, where ^ means begining of a word
 (even if there's no character encoded there). That element would be such
 that :

   e  ë  ^ë

 But this requires a prior definition of word boundaries to recognize the ^
 as an additional collation element by itself (usable distinctly only in
 context, and ignored when it occurs anywhere else, meaning that all weights
 assigned to ^ alone would be null.)

 So ^ë would become valid as a collation element, but т^ё makes no sense
 if there's no possible word boundary between т and ё.

 This would work with the UCA algorithm, which does not really mandate what
 is a collation element (not only in terms of encoding as characters), or
 any syntax to support it.

 This mechanism of incorporating word boundaries in UCA would be an
 interesting extension for section 6.9 (Handling Collation Graphemes) of
 UTS#10 (but for now there's no support for it in LDML with a defined syntax
 allowing the insertion of boundaries or other contextual conditions).

 Would it also mean that using a CGJ at the beginning of a word will
cause a ё at the beginning of a word to be treated as a mid-word one?
Is space, CGJ  a well-formed character sequence?

Leo




Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 4:56 AM, Leif Halvard Silli
xn--mlform-...@xn--mlform-iua.no wrote:

 You say that the difference is primary in the beginning of a word but
 elsewhere secondary. And yes, that orthographic dictionary that you
 link to above, looks as you describe.

 However, in reality, the difference is secondary - if that is the right
 word - even as the first letter in a word. Wikipedia has the following
 example: едок  ёж  ездит.[1] And, for instance the word ёлка could
 also be written елка.

 [1] http://en.wikipedia.org/wiki/Ё#Russian

Wikipedia's example is sadly unsourced, unlike mine.

 Hence I would argue that the dictionary you linked to above considers
 the difference to *always* be secondary. It is just that the dictionary
 applies the sorting algorithm to a collection where the words that
 begins with the letter Ё has been separated from words that begins on
 the letter Е.

Isn't that notionally the same as having the difference primary for
the first letter?

 A cursory scan of the UCA doesn't reveal if that's implementable, and
 experiments in a fairly fresh Linux Mint yield either
 ель  ёлка  тель  тёлка or ель  тель  тёлка  ёлка depending on
 the LANG setting (en_US works better than ru_RU).

 (Both examples consider the difference primary, but the the last
 example is incorrect as the ёлка follows after the тёлка - which is
 incorrect from every angle (except from the angle of the number of the
 letter inside Unicode.)

Right. And, ironically, the [en] collation is the correct one.

 Could someone tell if the UCA in its current form is able to support that?

 Is there not a need for 3 kinds of sorting? Namely: a) Е/Ё as always
 distinct letters, b) Е/Ё as always non-distinct letters, c) Е/Ё as
 non-distinct letters except when used as the first letter. (Note that
 the last variant would only be yield correct result on collections of
 words where a first-letter Ё is guaranteed be rendered with a Ё. Thus,
 if ёлка is written елка, then the result becomes incorrect.)

We're not talking here about *words per se* that may or may not be
rendered with a Ё, we're talking about letter sequences with Ё as a
given. The dictionary order shows that all word-initial Ёs go after
all word-initial Еs, but within a word the difference is secondary.
For a set of letter sequences using canonical spelling of words, the
collation algorithm should give their dictionary ordering, shouldn't
it?

Re the linguistic PS: you're right, and that proves that an
approximation to the proper collation using secondary ordering is
preferred to an approximation using primary ordering.

Leo




I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-21 Thread Jameson Quinn
But I still intend to do this before the end of January.

Jameson


Re: UCA and Russian letter Ё

2012-12-21 Thread Markus Scherer
Resending my earlier reply. Apparently, by default, Gmail sends subject
lines in KOI8-R if they contain Cyrillic, and unicode.org rejects those as
likely spam. I just changed my Gmail settings to Use Unicode (UTF-8)
encoding for outgoing messages and hope this goes through. (*Please change
the subject line* if you want to discuss *this* issue.)

My earlier reply was:

Theoretically, it is possible to select collation elements based on the
proximity of word boundaries or other criteria. However, I don't know if
there is an implementation that has that built in. ICU (one of the commonly
used implementations of UCA+CLDR) does not.

It sounds like the secondary difference is ok for sorting, but you are
looking to customize an alphabetic index such that there is a separate
bucket for words beginning with Ё. I think the best would be to do that
with some custom code that looks for Ё as the first character, in addition
to the regular bucketing and sorting.

Best regards,
markus
-- 
Google Internationalization Engineering


Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Leo Broukhis, Fri, 21 Dec 2012 08:57:11 -0800:
 On Fri, Dec 21, 2012 at 4:56 AM, Leif Halvard Silli wrote:
 
 You say that the difference is primary in the beginning of a word but
 elsewhere secondary. And yes, that orthographic dictionary that you
 link to above, looks as you describe.
 
 However, in reality, the difference is secondary - if that is the right
 word - even as the first letter in a word. Wikipedia has the following
 example: едок  ёж  ездит.[1] And, for instance the word ёлка could
 also be written елка.
 
 [1] http://en.wikipedia.org/wiki/Ё#Russian
 
 Wikipedia's example is sadly unsourced, unlike mine.

My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian 
Dictionary from 2003 agree that both list words on Ё and Е under the 
same category – namely, under the letter Е.  Also, the Russian 
wikipedia article on the letter Ё says as well that this is how sorting 
should happen. 
http://ru.wikipedia.org/wiki/Ё#.D0.A1.D0.BE.D1.80.D1.82.D0.B8.D1.80.D0.BE.D0.B2.D0.BA.D0.B0
 
And the article list xindy as one applications that handles this. 
http://en.wikipedia.org/wiki/Xindy

 Hence I would argue that the dictionary you linked to above considers
 the difference to *always* be secondary. It is just that the dictionary
 applies the sorting algorithm to a collection where the words that
 begins with the letter Ё has been separated from words that begins on
 the letter Е.
 
 Isn't that notionally the same as having the difference primary for
 the first letter?

Input from a coalition expert would be welcome. However, this is how I 
think: 

Should one expect such an algorithm to write the phone book on one’s 
behalf? Or that it writes the dictionary? I think that would be an 
unrealistic expectation. E.g. a dictionary or phone book has precise 
rules for how the words as written and grouped before they are sorted.

Fact is, again, that ёлка - in the wild - can be written ёлка and 
елка. So if you assume that the algorithm should only deal with ёлка, 
then you are also saying that you want the algorithm to deal with words 
that have been prepared for sorting. Thus you are talking about a well 
prepared text were ёлка is always written ёлка and not елка.

While not a definitive proof, I may also mention that the CSS list 
module defines an enumeration style based on the Russian alphabet, in 
which the ё is excluded.

http://www.w3.org/TR/css3-lists/#lower-russian

 A cursory scan of the UCA doesn't reveal if that's implementable, and
 experiments in a fairly fresh Linux Mint yield either
 ель  ёлка  тель  тёлка or ель  тель  тёлка  ёлка depending on
 the LANG setting (en_US works better than ru_RU).
 
 (Both examples consider the difference primary, but the the last
 example is incorrect as the ёлка follows after the тёлка - which is
 incorrect from every angle (except from the angle of the number of the
 letter inside Unicode.)
 
 Right. And, ironically, the [en] collation is the correct one.

Perhaps this bug is because the Russian localizers failed to get it the 
way they wanted: Full alignment of Е and Ё? ;-) 
 
 Could someone tell if the UCA in its current form is able to support that?
 
 Is there not a need for 3 kinds of sorting? Namely: a) Е/Ё as always
 distinct letters, b) Е/Ё as always non-distinct letters, c) Е/Ё as
 non-distinct letters except when used as the first letter. (Note that
 the last variant would only be yield correct result on collections of
 words where a first-letter Ё is guaranteed be rendered with a Ё. Thus,
 if ёлка is written елка, then the result becomes incorrect.)
 
 We're not talking here about *words per se* that may or may not be
 rendered with a Ё, we're talking about letter sequences with Ё as a
 given. The dictionary order shows that all word-initial Ёs go after
 all word-initial Еs, but within a word the difference is secondary.
 For a set of letter sequences using canonical spelling of words, the
 collation algorithm should give their dictionary ordering, shouldn't
 it?

I believe the English Wikipedia article is pretty canonical when it 
says that it can be done both ways - see the sources I pointed to above 
for examples of sorting where the status as first letter doesn't matter.

I don't know why the dictionary you pointed two 
http://ru.wikisource.org/wiki/Орфографический_словарь_русского_языка 
has separated the words. It could be a technical limitation of 
MediaWiki. Or it could be because those who initiated the project felt 
it made the most sense. (It does make a lot of sense to me  … he, he.)  
But that dictionary is also peculiar in that it lists words that 
begins on the letter Ы. :-) It is typical to say that no words begins 
on the letter Ы. :-) But the list managed to find some … (Including one 
word that simply means to say ы.) Neither of the dictionaries I 
mentioned above have any words under the letter Ы. Even in the above 
mentioned CSS list module’s definition, the ы is excluded.

 Re the linguistic PS: you're right, and 

Re: UCA and Russian letter Ё

2012-12-21 Thread Jukka K. Korpela

2012-12-21 21:05, Leif Halvard Silli wrote:


My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian
Dictionary from 2003 agree that both list words on Ё and Е under the
same category – namely, under the letter Е.


This appears to be the case in any serious dictionary.

The use of the Cyrillic letter yo (ё, called IO in the Unicode name) has 
varied through ages, but it has never been a dominant spelling to use 
it. According to “The World’s Writing Systems”, edited by Peter T. 
Daniels and William Bright (Oxford University Press, 1995), “The letter 
ё is used virtually only in dictionaries or language textbooks.” It may 
have become more popular in the Internet, but still less common than 
using the letter ye (IE, е) in its stead.



Fact is, again, that ёлка - in the wild - can be written ёлка and
елка.


And in most contexts, it is written “елка”.

It is of course possible that some people would prefer treating “ё” as a 
primarily different letter. But it’s rather illogical to require that it 
be treated that way at the start of a word only. I don’t think collation 
rules need to accommodate such preferences.


Yucca






RE: UCA and Russian letter Ё

2012-12-21 Thread Joe

 Fact is, again, that ёлка - in the wild - can be written ёлка and елка

Though you need a better dictionary: it's the diminutive of ель (as in 
Yel'tsin) meaning fir tree, and is the 4-letter word for Christmas tree.

С Рождеством,

Joe







Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 11:35 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote:
 2012-12-21 21:05, Leif Halvard Silli wrote:

 My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian
 Dictionary from 2003 agree that both list words on Ё and Е under the
 same category – namely, under the letter Е.

 This appears to be the case in any serious dictionary.

You're right. In an influential orthographic dictionary the difference
is secondary,
e.g. ёлка is between елисейский дворец and ёлки-палки:
http://lopatina-slovar.com/description/elka/34736
(The site database has been built by scanning a printed dictionary)

However, the preferences could change, as electronic dictionaries seem
to demonstrate.

 It is of course possible that some people would prefer treating “ё” as a
 primarily different letter. But it’s rather illogical to require that it be
 treated that way at the start of a word only. I don’t think collation rules
 need to accommodate such preferences.

Granted, not yet, but by itself the argument is invalid. Unicode
collation rules are descriptive;
if, for example,  a language happens to sort accents backwards, this
rule has to be - and is - accommodated despite its apparent
illogicality;
along the same lines, if a language happens to make a distinction
discussed in this thread, it has to be accommodated just as well.

Also, In several languages the rules have changed over time, and so
*older dictionaries may use a different order than modern ones* [emph.
mine - LB]. Furthermore, collation may depend on use. For example,
German dictionaries and telephone directories use different
approaches.
[http://en.wikipedia.org/wiki/Collation]

The distinction in two collation methods in German (secondary vs
expanded umlauts) is prominent enough to be mentioned in UCA. Luckily
for Germans, both methods are covered by the algorithm thanks to
requirements of other languages.

My question is as follows: does UCA have to be modified (e.g. by
adding another bit flag word-initial primary next to the existing
backward secondary) to support the feature if it were to be
implemented, or is there a way to achieve the new Russian online
collation within the existing UCA without modifying  the strings to
be sorted before the application of the algorithm?

Leo




Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Jukka K. Korpela, Fri, 21 Dec 2012 21:35:16 +0200:
 2012-12-21 21:05, Leif Halvard Silli wrote:
 
 My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian
 Dictionary from 2003 agree that both list words on Ё and Е under the
 same category – namely, under the letter Е.
 
 This appears to be the case in any serious dictionary.

In «Tolkovïj slovar’ sovremennogo russkogo jazïka» from 2005 
(«Dictionary over contempary Russian language»), has located words on Ё 
in its a separate category, consisting of exactly one word: Ёмкость. 
That, and the dictionary Leo pointed to, tell me that there is a 
difference between categorization and collation.

 The use of the Cyrillic letter yo (ё, called IO in the Unicode name) 
 has varied through ages, but it has never been a dominant spelling to 
 use it. According to “The World’s Writing Systems”, edited by Peter 
 T. Daniels and William Bright (Oxford University Press, 1995), “The 
 letter ё is used virtually only in dictionaries or language 
 textbooks.” It may have become more popular in the Internet, but 
 still less common than using the letter ye (IE, е) in its stead.

The internet has also really boomed since 1995. ;-)

 Fact is, again, that ёлка - in the wild - can be written ёлка and
 елка.
 
 And in most contexts, it is written “елка”.

Google Trends has «ёлка» as *pretty* close — I think, but «елка» 
remains in the leead. http://www.google.com/trends/explore#q=ёлка,елка

 It is of course possible that some people would prefer treating “ё” 
 as a primarily different letter. But it’s rather illogical to require 
 that it be treated that way at the start of a word only. I don’t 
 think collation rules need to accommodate such preferences.

Right: To require it would be not be in tune with praxis.
-- 
leif halvard silli




Re: I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-21 Thread Clive Hohberger
Don't worry, I think you now have another 5351 years until the next Mayan
Doomsday...

Happy Holidays to everyone
Clive

On Friday, December 21, 2012, Jameson Quinn wrote:

 But I still intend to do this before the end of January.

 Jameson



-- 
Clive P. Hohberger, PhD MBA
Managing Director
*Clive Hohberger, LLC*
+1 847 910 8794
cp...@case.edu


Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 1:08 PM, Leif Halvard Silli
xn--mlform-...@xn--mlform-iua.no wrote:

 In «Tolkovïj slovar’ sovremennogo russkogo jazïka» from 2005
 («Dictionary over contempary Russian language»), has located words on Ё
 in its a separate category, consisting of exactly one word: Ёмкость.

This is either a mistake or a misunderstanding. There are a few dozen
words starting with Ё:
http://ru.wikisource.org/wiki/%D0%9E%D1%80%D1%84%D0%BE%D0%B3%D1%80%D0%B0%D1%84%D0%B8%D1%87%D0%B5%D1%81%D0%BA%D0%B8%D0%B9_%D1%81%D0%BB%D0%BE%D0%B2%D0%B0%D1%80%D1%8C_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D1%8F%D0%B7%D1%8B%D0%BA%D0%B0_%28%D0%81%29

Some online dictionaries may require you to click on a word to expand
a word range.

 That, and the dictionary Leo pointed to, tell me that there is a
 difference between categorization and collation.

You're right. A primary difference is categorizing (e.g. when many
people have to check in to an event, the waiting lines may be
categorized by several primarily distinct letters of the last name), a
secondary difference isn't. Also, speaking of dictionary vs phone book
collation, I'd like to know how Ельцин vs Ёлкин would be sorted but I
don't know how to find out. During Soviet times, the White Pages
weren't accessible to the public.

 It is of course possible that some people would prefer treating “ё”
 as a primarily different letter. But it’s rather illogical to require
 that it be treated that way at the start of a word only. I don’t
 think collation rules need to accommodate such preferences.

 Right: To require it would be not be in tune with praxis.

I'm not in a rush. :)

Leo




RE: UCA and Russian letter Ё

2012-12-21 Thread Whistler, Ken
Leo Broukhis said:

 Granted, not yet, but by itself the argument is invalid. Unicode
 collation rules are descriptive;

I'm not sure what you mean by that. UTS #10 is a *specification* of an 
algorithm, with various options  for tailoring and parameterization which make 
it possible to accommodate various needs for particular cases. It is not 
intended as a descriptive mechanism.

Perhaps you are referring to LDML, which includes a formal mechanism for 
describing a particular collation in terms of the default table and tailoring 
options and parameterization options of the UCA.

 if, for example,  a language happens to sort accents backwards, this
 rule has to be - and is - accommodated despite its apparent
 illogicality;

Backwards accent secondary weighting was actually included primarily because of 
prior art in collation standards, because of the need to be able to synchronize 
the UCA algorithm with ISO 14651, and because it makes it easier  to explain 
how folks can implement versions of multi-level collation which can pass the 
conformance tests of the Canadian sorting standard, etc.

 along the same lines, if a language happens to make a distinction
 discussed in this thread, it has to be accommodated just as well.

No, I don't think so.

It is rather easy to come up with distinctions or collation requirements which 
simply cannot be accommodated within the intended bounds of the UCA. For 
example, sorting all numerical expressions mixed with text strictly by their 
numeric values, or sorting all (or some specified list) of abbreviations as if 
they were spelled out, and so forth.

Many lexicographical ordering rules cannot be fully accommodated within the 
context of the UCA algorithm, which is a multilevel *string comparison* 
specification, and not a dictionary ordering specification.

 
 My question is as follows: does UCA have to be modified (e.g. by
 adding another bit flag word-initial primary next to the existing
 backward secondary) to support the feature if it were to be
 implemented, or is there a way to achieve the new Russian online
 collation within the existing UCA without modifying  the strings to
 be sorted before the application of the algorithm?

I don't think there is any out-of-the-box way to use UCA so that an 
implementation would automatically recognize a word boundary context and weight 
characters conditionally based on that context. So no, I don't think you could 
get an implementation to do that without first marking up text with additional 
characters to indicate word boundaries and then tailoring the weight table to 
weight sequences including that markup accordingly.

This is actually derived trivially from the fact that UCA knows nothing 
whatsoever about word boundaries. At core, it is just a mechanism to take a 
string input and provide an output vector of collation weights. You would have 
to have to hook it up to a text segmentation algorithm to even identify 
words, and then that text segmentation algorithm would itself have to be 
tailored and tuned to whatever language you had in mind, because the criteria 
for identifying words will vary from language to language, and even 
orthography to orthography.

But there is another possible sense of the question, does UCA have to be 
modified... to support..., i.e. is the UTC somehow required to augment the 
algorithm to support some particular kind of behavior for a particular 
language's sorting rules, just because someone has turned up particular odd 
behavior. And I think the answer to that is clearly no. Oh, and by the way, I 
don't think LDML must (or should) be augmented to enable it to describe any and 
all lexicographical ordering practices, either. That isn't the function of LDML.

--Ken





RE: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Joe, Fri, 21 Dec 2012 12:48:47 -0800:
 
 Fact is, again, that ёлка - in the wild - can be written ёлка and елка
 
 Though you need a better dictionary: it's the diminutive of ель (as 
 in Yel'tsin) meaning fir tree, and is the 4-letter word for 
 Christmas tree.

The dictionary of Dal,[1] says: «Ель, ели́на, умал. ёлка [snip]», which 
ought to mean that ёлка is a diminutive of ель. My impression is the 
same as yours with regard to the Christmas tree/New year tree meaning, 
but many dictionaries do list fir tree as the primary meaning of ёлка 
and Christmas/New year tree as a secondary meaning.

[1] http://en.wikipedia.org/wiki/Vladimir_Dal

С праздником!
-- 
leif halvard silli




Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Leo Broukhis, Fri, 21 Dec 2012 13:43:14 -0800:
 On Fri, Dec 21, 2012 at 1:08 PM, Leif Halvard Silli
 xn--mlform-...@xn--mlform-iua.no wrote:
 
 In «Tolkovïj slovar’ sovremennogo russkogo jazïka» from 2005
 («Dictionary over contempary Russian language»), has located words on Ё
 in its a separate category, consisting of exactly one word: Ёмкость.
 
 This is either a mistake or a misunderstanding. [ snip ]

Not at all. THe dictionary I referred to is a dictionary on paper which 
only contains new words or words with changed meaning etc. Thus, a 
dictionary of hot words for the time being. That particular 
dictionary only found room for one such word on ё-. :-)

 That, and the dictionary Leo pointed to, tell me that there is a
 difference between categorization and collation.
 
 You're right. A primary difference is categorizing (e.g. when many
 people have to check in to an event, the waiting lines may be
 categorized by several primarily distinct letters of the last name), a
 secondary difference isn't. Also, speaking of dictionary vs phone book
 collation, I'd like to know how Ельцин vs Ёлкин would be sorted but I
 don't know how to find out. During Soviet times, the White Pages
 weren't accessible to the public.

I think that this is definitely one thing that can be affected by 
electronic media. But I just checked how Thunderbird sorts words and Ё- 
and Е- and it treats them as one and the same, even when the the Ё is 
the first letter of the word. Which to me makes sense in such an 
uncategorized medium as a list of e-mail since the user wants him- or 
herself to verify that he/she has seen all the message. However, I 
agree that in a dictionary etc, then it could probably make sense to 
have separate categories for Ё and Е. 

Question is whether categorization is a subject for collation algorithm.
-- 
leif halvard silli




Re: I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-21 Thread Julian Bradfield
On 2012-12-21, Clive Hohberger cp...@case.edu wrote:
 Don't worry, I think you now have another 5351 years until the next Mayan
 Doomsday...

It's only 394 years till the next b'ak'tun.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




RE: I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-21 Thread Doug Ewell
And as you've no doubt heard to death by now, real Maya don't believe in that 
apocalyptic mumbo-jumbo anyway. Today was a celebration.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

-Original Message-
From: Julian Bradfield jcb+unic...@inf.ed.ac.uk
Sent: ‎12/‎21/‎2012 15:55
To: unicode@unicode.org unicode@unicode.org
Subject: Re: I missed my self-imposed deadline for the Mayan numeral proposal

On 2012-12-21, Clive Hohberger cp...@case.edu wrote:
 Don't worry, I think you now have another 5351 years until the next Mayan
 Doomsday...

It's only 394 years till the next b'ak'tun.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.





Re: I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-21 Thread John H. Jenkins
http://xkcd.com/998/

On 2012年12月21日, at 下午4:22, Doug Ewell d...@ewellic.org wrote:

 And as you've no doubt heard to death by now, real Maya don't believe in that 
 apocalyptic mumbo-jumbo anyway. Today was a celebration.
 
 --
 Doug Ewell | Thornton, Colorado, USA
 http://www.ewellic.org | @DougEwell
 From: Julian Bradfield
 Sent: ‎12/‎21/‎2012 15:55
 To: unicode@unicode.org
 Subject: Re: I missed my self-imposed deadline for the Mayan numeral proposal
 
 On 2012-12-21, Clive Hohberger cp...@case.edu wrote:
  Don't worry, I think you now have another 5351 years until the next Mayan
  Doomsday...
 
 It's only 394 years till the next b'ak'tun.
 
 -- 
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 
 
 



Re: I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-21 Thread Philippe Verdy
If the name of this ending baktun really means rebirth or renaissance,
then the real catastrophe occured 394 years ago, in 1618, just because of
the conquest of America by Spanish troops : which meant a massive death of
lots of Amerindians (most of them due to imported infections, to which
Amerindians were not protected, but also due to the end of development of
the Mayan civilization caused by their internal wars, their concentration
in giant cities which lacked the resources to survive in a more
concentrated territory).
So since 1618, the Mayans have changed completely of civilization, and
became a minority. They are now being recognized with more respect
(including for their native languages that was largely ignored when Spanish
became official). The rebirth or renaissance just started with the
European Renaissance (ending Middle-Age), and this is a strange coincidence.

Well, not everybody thinks that this ending baktun was really the last one
in the cycle (are there 12 or 13 baktuns in the longest cycle ? Mayans may
have just stopped counting after that, certainly because their past
civilization was already ending at that time, or because nobody knows how
they were counting these long cycles of more than 12 or 13 baktun, or if
they counted them starting by zero or one and even Mayans can't tell when
the historic first cycle really occured in the past, just like we don't
know really where to start the proleptic Julian calendar (in 4714 BC,
really ?), in a time where history of dates was not written.

(Even the Julian dates up to J.C. birth in the Christian are a
reconstruction imagined during the 6th Century, several centuries after the
Julian calendar was normalized : years werre still counted after a Roman
Emperor or other political rulers in separate eras, even if the length of a
year was formalized in 325 in the Concile of Nicea for unifying the various
Julian calendars used in Europe, so even today, we still don't know the
exact date Jesus Christ was really born, or when exactly the Julian
calendar started under Julius Caesar : various options are still possible
for matching Roman eras with the modern Julian calendar still used today,
notably for some question : was 4 AD a leap year or not and when Augustus
really suspended the triennial leap years, for correctly determining when
the Julius Caesar calendar really started ; the most recent proposal for
matching dates in the Julius Caesar era was created in 2003 after the
discovery of an old Egyptian papyrus, matching Julius Caesar's era years
with Egyptian dates).

If we don't know when the Julian calendar started, then we also don't know
really how to match the Gregorian calendar with it. We also don't know
exactly when the Mayan calendar started, or if this was also just a
reconstruction (with more or less historic approximations). Let's just
learn something : the human history is not written since very long. And
most of it as been reconstructed with lots of errors (notably of
interpretation of old texts or because we can't figure out the validity of
these old texts to know if they were accurate too in the time where they
were written).