subject:"\[NTG\-context\] \(again\) index sorting of accented characters"

Re: [NTG-context] (again) index sorting of accented characters

2017-05-01 Thread Pablo Rodriguez

On 04/29/2017 05:36 PM, Schmitz Thomas A. wrote:
> [...]> Who is going to profit from this long discussion?

Sorry for having abused your help and your time, Thomas.

I’m afraid I cannot explain such a basic issue clearly.

Many thanks again for your kind help,

Pablo
-- 
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-29 Thread Schmitz Thomas A.

> On 29. Apr 2017, at 16:51, Pablo Rodriguez  wrote:
> 
> An index with classical Greek words (or names) that follows the same
> principle as in German, English or Dutch: word sorting is the same as in
> most important dictionaries.
> 
> This is the main reason of having it as a default.

Sorry, but I still don't see your point here. 

1. You refer to “practice over centuries.” Can you point me to a traditional 
index that contains an entry such as ἐκτὸς (from your example file)?

2. Sorting in ConTeXt may be used for more purposes than for printed books. I 
use it to analyze the content of my TEI xml files.

3. You refer to a new edition of LSJ in 2017. If it sorted words the way you 
suggest, with every possible morphological form as its own entry, you would 
think that was “flawed” too.

So I’m sorry, I still don’t get your point: this is about your personal 
preference, and I still don’t see why you are pushing so hard to have this 
preference as a default. This really is a corner case that is of interest to so 
few users that I wouldn’t consider it a good use of a developer’s time. It’s 
easy to achieve what you want (have you looked into the luatex solution I 
provided?), what is insufficient about this? Who is going to profit from this 
long discussion?

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-29 Thread Pablo Rodriguez

On 04/29/2017 01:42 PM, Schmitz Thomas A. wrote:
>> Could you confirm that the right word order is the second list in this
>> message instead of the first one that ConTeXt generates by default?
> 
> No, I don't see why yours should be “right” and the order that is
> produced now should be “wrong.” It really depends on the purpose of your
> sorting.

Sorry, Thomas, I’m afraid I don’t get your point here.

I mean, if alphabetic sorting makes any sense at all, this is to sort
index and dictionary entries. (If not, please tell me what I am missing
here.)

Imagine that LSJ is edited again in 2017. You purchase the paper edition
and you notice that vowels are sorted considering their diacritics too
(resulting in cases such as ἅλς placed after ἁμαρτία). Wouldn‘t you
think that sorting is somehow “flawed” in that new edition?

The point I’m trying to make is that this isn’t about my personal
preferences, but about conventions used for centuries.

> I don’t know who contributed the current code to sort-ini.lua, but it
> makes consistent choices and produces a possible order. It’s not the
> order you would prefer, granted. It’s not the order I would prefer,
> granted again.
Hans kindly provided them
(https://mailman.ntg.nl/pipermail/ntg-context/2017/088340.html) to a
request of mine.

After using it, I realized that word order wasn’t right, although I
didn’t understand why replacements were needed. Or why replacements
required an unaccented Greek vowel and a Latin letter.

Hans replied that some order was always needed. I needed more samples to
realize that this wasn’t what I wanted.

> But what is the purpose of pushing so hard to have your favorite 
> order included as default? You know what to do to have this order,
> and that’s all that’s important for you.
This isn’t about my favorite order. This is about indices of (ancient)
Greek names or words.

German has five sorting criteria (de, Duden, two DIN and de-AT), but why
is the default criterium to sort (ancient) Greek foreign to practice
over centuries?

That being said, I don’t that Hans intended to establish an new sorting
criterium. The whole problem was that I couldn’t explain this issue better.

> Other users may have different priorities (witness the long list in
> sort-ini.lua: someone went to great lengths to define this order). So I
> still don’t see what you’re trying to accomplish.

An index with classical Greek words (or names) that follows the same
principle as in German, English or Dutch: word sorting is the same as in
most important dictionaries.

This is the main reason of having it as a default.

I hope it is clear now,

Pablo
-- 
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-29 Thread Pablo Rodriguez

On 04/28/2017 06:27 PM, Florian Grammel wrote:
>>> Have you played with the different "methods" defined in sort-ini.lua, 
>>> lines 96-103?
>>
>> Many thanks for your reply, Thomas.
>>
>> The right values seem to be {zm, zc}. This works fine with Spanish and
>> French, […]
> 
> Even with {zm, zc} or any of the predefined methods I don’t really think
> it is working completely as expected, even though the problem might just
> occur in very special cases:

Hi Florian,

a simpler sample would be:

\mainlanguage[es]
\setupregister[language=es, method={zm, zc}]
\starttext
\startTEXpage[offset=2em]
\index{cómodo}
\index{comodos}
\index{cómoda}
\placeindex
\stopTEXpage
\stoptext

I know that "comodos" isn’t a word in Spanish. But it should be the last
word in the sorting.

> It there a way to get this result with „methods“ or would I need to
> modify the sort-rules?

I think that sort-lan.lua might be wrong here. I explain why.

German ("de-DE") has the following code:

replacements = {
{ "ä", 'ae' }, { "Ä", 'Ae' },
{ "ö", 'oe' }, { "Ö", 'Oe' },
{ "ü", 'ue' }, { "Ü", 'Ue' },
{ "ß", 's'  },
},

This is to get Umlaut-forms and eszet sorted as ae, Ae, oe, Oe, ue, Ue
and s (which I wonder whether ß shouldn’t be replaced as ss).

Austrian German ("de-AT") doesn’t contain these replacements.
Umlaut-forms are given different entries.

I guess if we need a similar behavior for Spanish, replacements of
accented glyphs should be created, such as in:

replacements = {
{ "á", 'a' }, { "Á", 'A' }, { "é", 'e' }, { "É", 'E' },
{ "í", 'i' }, { "Í", 'I' }, { "ó", 'o' }, { "Ó", 'O' },
{ "ú", 'u' }, { "Ú", 'u' }, { "ü", 'u' }, { "Ü", 'u' },
},

So you get the right word order:

cómoda
cómodo
comodos

But Hans has to confirm the issue before.

Just in case it helps,

Pablo
-- 
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-29 Thread Schmitz Thomas A.

> On 29. Apr 2017, at 13:10, Pablo Rodriguez  wrote:
> 
> I don‘t know why "α" isn’t the first in sorting, but it is clear that
> letters with different diacritical marks are considered as different
> letters for word sorting.
> 
> Could you confirm that the right word order is the second list in this
> message instead of the first one that ConTeXt generates by default?

No, I don't see why yours should be “right” and the order that is produced now 
should be “wrong.” It really depends on the purpose of your sorting. I don’t 
know who contributed the current code to sort-ini.lua, but it makes consistent 
choices and produces a possible order. It’s not the order you would prefer, 
granted. It’s not the order I would prefer, granted again. But what is the 
purpose of pushing so hard to have your favorite order included as default? You 
know what to do to have this order, and that’s all that’s important for you. 
Other users may have different priorities (witness the long list in 
sort-ini.lua: someone went to great lengths to define this order). So I still 
don’t see what you’re trying to accomplish.

Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-29 Thread Pablo Rodriguez

On 04/27/2017 11:08 PM, Thomas A. Schmitz wrote:
> Two remarks:
> 
> 1. I'm not sure what you're looking for.

Sorry, Thomas, it is a question on pure word order. No correction in
form selection for any existing or possible index.

This is my sample:

\setupbodyfont[dejavu]
\setupregister[language=gr, method={zm, zc}]
\starttext
\startTEXpage[offset=2em]
\index{ἁμαρτάνω}
\index{ἁστρονόμος} % I know breathing is wrong (only for testing)
\index{Ἀπόλλων}
\index{Ἀσκληπιός}
\index{ἅπαξ}
\index{αἰεί}
\index{πᾶσα}
\index{πᾶς}
\placeindex
\stopTEXpage
\stoptext

This is the word order I get with current beta:

Ἀπόλλων
Ἀσκληπιός
αἰεί
ἁμαρτάνω
ἁστρονόμος
ἅπαξ
πᾶσα
πᾶς

Right word order should be:

αἰεί
ἁμαρτάνω
ἅπαξ
Ἀπόλλων
Ἀσκληπιός
ἁστρονόμος
πᾶς
πᾶσα

To get the right sorting, I have to apply the following patch:
http://www.ousia.tk/grc-replacements.diff.

These replacements are required, from what I understand, because with
the default settings "ἀ" is replaced with "αf", "α" with "αa", "ἁ" with
"αg" and "ἅ" with "αl".

I don‘t know why "α" isn’t the first in sorting, but it is clear that
letters with different diacritical marks are considered as different
letters for word sorting.

Could you confirm that the right word order is the second list in this
message instead of the first one that ConTeXt generates by default?

Many thanks for your help,

Pablo
-- 
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-28 Thread Florian Grammel

>> Have you played with the different "methods" defined in sort-ini.lua, 
>> lines 96-103?
> 
> Many thanks for your reply, Thomas.
> 
> The right values seem to be {zm, zc}. This works fine with Spanish and
> French, […]


Even with {zm, zc} or any of the predefined methods I don’t really think it is 
working completely as expected, even though the problem might just occur in 
very special cases:

\setupregister[method={zm, zc}]
\starttext
\startTEXpage[offset=2em]
\index{káv}
\index{kav}
\index{káva}
\index{kava}
\index{káf}
\index{kaf}
\index{káfa}
\index{kafa}
\index{kaka}
\index{káka}
\placeindex[language=es]
\stopTEXpage
\stoptext

gives

kaf
kafa
káf
káfa
kaka
káka
kav
kava
káv
káva

though I’d think it would be correct/logical to sort: 

kaf
káf
kafa
káfa
kaka
káka
kav
káv
kava
káva

It there a way to get this result with „methods“ or would I need to modify the 
sort-rules?

(The example words are not Spanish, obviously, but Icelandic/Faroese, which I 
am trying to correct/set up. But I’d imagine that the same would be the desired 
behaviour for Spanish and languages with similar traditions, too, wouldn’t it?)

Best
Florian.





Florian Grammel

Copenhagen, Denmark

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-27 Thread Thomas A. Schmitz


On 04/27/2017 10:26 PM, Pablo Rodriguez wrote:

Could you please confirm the issue?

Many thanks for your help,



Two remarks:

1. I'm not sure what you're looking for. Do you really want an index 
that sorts every form of every word as an entry? So that ἐμήν and ἐμοῖς 
are different words and not occurrences of the same entry? If that's 
really what you're looking for, you may want to look into a very handy 
luatex function: characters.shaped() returns the unaccented characters 
of a unicode string, see chapter 11.2 of cld-mkiv.pdf. Define your own 
command that uses this lua function to index the unaccented word; that's 
not too hard.


2. If, on the other hand, you want to build a real index that will sort 
morphological forms under their head words, you will have to give the 
sort term explicitly, and then you don't have to rely on ConTeXt's 
abilities to sort accented Greek because you will have something like 
ἐμήν\index{εμοσ} in your text. For the time being, there's no software 
that can reliably parse ancient Greek, I'm afraid.


Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-27 Thread Pablo Rodriguez

On 04/27/2017 08:51 PM, Thomas A. Schmitz wrote:
> On 04/27/2017 07:21 PM, Pablo Rodriguez wrote:
>> I mean, if this is the way, I have other two patches for other two
>> languages in which I have indices.
>>
>> And if I’m wrong, I would like to know how to get right word sorting in
>> registers.
> 
> Have you played with the different "methods" defined in sort-ini.lua, 
> lines 96-103?

Many thanks for your reply, Thomas.

The right values seem to be {zm, zc}. This works fine with Spanish and
French, but ancient Greek is more problematic.

I have a source file, http://www.ousia.tk/grc-index.tex. Standard
sorting gives the following results
http://www.ousia.tk/grc-index-standard.pdf#page=3.

When I add replacements (http://www.ousia.tk/grc-replacements.diff) to
sort-lan.lua, sorting order is right
(http://www.ousia.tk/grc-index-modified.pdf#page=3).

Could you please confirm the issue?

Many thanks for your help,

Pablo
-- 
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

2017-04-27 Thread Thomas A. Schmitz


On 04/27/2017 07:21 PM, Pablo Rodriguez wrote:

I mean, if this is the way, I have other two patches for other two
languages in which I have indices.

And if I’m wrong, I would like to know how to get right word sorting in
registers.


Have you played with the different "methods" defined in sort-ini.lua, 
lines 96-103?


Thomas
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

[NTG-context] (again) index sorting of accented characters

2017-04-27 Thread Pablo Rodriguez

Dear list,

sorry for bothering again with this issue, but I need to have indices in
my documents.

I have the following sample:

\mainlanguage[es]
\setupregister[method=default]
\starttext
\startTEXpage[offset=1em]
\index{ámame}\index{arisco}\index{ándrago}
\index{antonia}\index{antón}
\placeindex
\stopTEXpage
\stoptext

Word sorting is the following:

antonia
antón
arisco
ámame
ándrago

Right word order is:

ámame
ándrago
antón
antonia
arisco

In Spanish, as in other languages, an accented letter has no different
sorting that its unaccented counterpart.

I got the right word order adding these replacements in sort-lan.lua:

replacements = {
{ "á", "a" }, { "é", "i"}, { "í", "i" }, { "ó", "o"},
{ "ú", "u" }, { "ü", "u" },
},

Could anyone explain me whether this is the right way of doing it?

I mean, if this is the way, I have other two patches for other two
languages in which I have indices.

And if I’m wrong, I would like to know how to get right word sorting in
registers.

Many thanks for your help,

Pablo
-- 
http://www.ousia.tk
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

Re: [NTG-context] (again) index sorting of accented characters

[NTG-context] (again) index sorting of accented characters

11 matches

Site Navigation

Mail list logo

Footer information