Re: [NTG-context] sort-lan.lua nitpicks and sorting

2010-05-07 Thread Hans Hagen

On 2-5-2010 3:59, Philipp Gesang wrote:


1. In sort-lan.lua, line 101 should read «['r'] = r», and line 144
«['r'] = 26, -- r».


i patched the file


2. Although I read the disclaimer about said file being “preliminary and
incomplete” -- is there some rationale behind the range of integers for
each language mapping? The mapping for English goes from 1 to 51,
interleaving 2 integers for each letter (which is odd because it should
start from index 3 with “a”, shouldn't it?), while the Czech one goes
from 1 to 40 without skipping, Finnish and Austrian from 1 to 58.


some old (ruby) code was used etc etc


   What about mapping them onto a larger but common scale that would
alleviate multilingual sorting so that the alphabetical representation
of the phoneme /a/ maps to the same value over different languages?†
E.g.
   [a] = 3, -- in a Latin mapping,
   [α] = 3, -- in Greek mapping,
   [а] = 3, -- in a Russian mapping.


hm, interesting ... feel free to reshuffle and provide patches


†   I know this is impractical for many writing systems and even within
the set of Latin or Greek based alphabets it largely depends on a given
purpose how much precision you need in sorting.


indeed but we can have multiple variants and are not bound to specific 
conventions


Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] sort-lan.lua nitpicks and sorting

2010-05-03 Thread Philipp Gesang
On 2010-05-02 15:59:53, Philipp Gesang wrote:
 Hi again,
 
 
 1. In sort-lan.lua, line 101 should read «['r'] = r», and line 144
 «['r'] = 26, -- r».

In lines 152 and 109 concerning the character “ů” (uring in unicode
speak) there's a typo, the key should be “uc(0x016F)” instead of
“uc(0x01F6)”.

The long quantities “ó” and “ý” are missing as well. They belong after
their short counterparts. I append a diff for the file.

Philipp


-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
--- /home/laokoon/base/sort-lan.lua 2010-04-07 23:10:04.0 +0200
+++ sort-lan.lua2010-05-03 09:28:23.813291928 +0200
@@ -98,7 +98,8 @@
 ['o']= o,
 ['p']= p,
 ['q']= q,
-['s']= r,
+['r']= r,
+[uc(0x00F3)] = uc(0x00F3), -- oacute
 [uc(0x0147)] = uc(0x0147), -- rcaron
 ['s']= s,
 [uc(0x0161)] = uc(0x0161), -- scaron
@@ -106,11 +107,12 @@
 [uc(0x0165)] = uc(0x0165), -- tcaron
 ['u']= u,
 [uc(0x00FA)] = u,
-[uc(0x01F6)] = u,
+[uc(0x016F)] = u,
 ['v']= v,
 ['w']= w,
 ['x']= x,
 ['y']= y,
+[uc(0x00FD)] = uc(0x00FD), -- yacute
 ['z']= z,
 [uc(0x017E)] = uc(0x017E), -- zcaron
 }
@@ -139,23 +141,25 @@
 ['n']= 21, -- n
 [uc(0x0147)] = 22, -- ncaron
 ['o']= 23, -- o
-['p']= 24, -- p
-['q']= 25, -- q
-['s']= 26, -- r
-[uc(0x0147)] = 27, -- rcaron
-['s']= 28, -- s
-[uc(0x0161)] = 29, -- scaron
-['t']= 30, -- t
-[uc(0x0165)] = 31, -- tcaron
-['u']= 32, -- u
-[uc(0x00FA)] = 33, -- uacute
-[uc(0x01F6)] = 34, -- uring
-['v']= 35, -- v
-['w']= 36, -- w
-['x']= 37, -- x
-['y']= 38, -- y
-['z']= 39, -- z
-[uc(0x017E)] = 40, -- zcaron
+[uc(0x00F3)] = 24, -- oacute
+['p']= 25, -- p
+['q']= 26, -- q
+['r']= 27, -- r
+[uc(0x0147)] = 28, -- rcaron
+['s']= 29, -- s
+[uc(0x0161)] = 20, -- scaron
+['t']= 31, -- t
+[uc(0x0165)] = 32, -- tcaron
+['u']= 33, -- u
+[uc(0x00FA)] = 34, -- uacute
+[uc(0x016F)] = 35, -- uring
+['v']= 36, -- v
+['w']= 37, -- w
+['x']= 38, -- x
+['y']= 39, -- y
+[uc(0x00FD)] = 40, -- yacute
+['z']= 41, -- z
+[uc(0x017E)] = 42, -- zcaron
 }
 
 -- French


pgpbI6xHnUWlY.pgp
Description: PGP signature
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


[NTG-context] sort-lan.lua nitpicks and sorting

2010-05-02 Thread Philipp Gesang
Hi again,


1. In sort-lan.lua, line 101 should read «['r'] = r», and line 144
«['r'] = 26, -- r».

2. Although I read the disclaimer about said file being “preliminary and
incomplete” -- is there some rationale behind the range of integers for
each language mapping? The mapping for English goes from 1 to 51,
interleaving 2 integers for each letter (which is odd because it should
start from index 3 with “a”, shouldn't it?), while the Czech one goes
from 1 to 40 without skipping, Finnish and Austrian from 1 to 58. 

  What about mapping them onto a larger but common scale that would
alleviate multilingual sorting so that the alphabetical representation
of the phoneme /a/ maps to the same value over different languages?†
E.g.
  [a] = 3, -- in a Latin mapping,
  [α] = 3, -- in Greek mapping,
  [а] = 3, -- in a Russian mapping.

3. Is it intended that the digraph “ch” resolves (temporarily) to
http://www.fileformat.info/info/unicode/char/ff01/index.htm according to
line 72?

Feel free to state more general opinions on the sorting topic as I am
playing with different ways of sorting my bibliography. I will be glad
about any advice,


Philipp

†   I know this is impractical for many writing systems and even within
the set of Latin or Greek based alphabets it largely depends on a given
purpose how much precision you need in sorting.


-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments


pgpMMTxusKfc8.pgp
Description: PGP signature
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___