[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-08-04 Thread mxn
mxn added a comment. The approach I was forced to take with Vietnamese (separate lexemes per word per writing system, “translations” from one writing system to another) has some downsides. For one thing, the criteria for a translation between vi and vi-Hani must be stricter than the

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-08-04 Thread mrephabricator
mrephabricator added a comment. This may be verging on pedantry, but I will say that the principle of "one form per combination of grammatical features" does not sound broadly applicable enough to follow for each language. Maybe I am missing something and this is just a convention for

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-25 Thread LucasWerkmeister
LucasWerkmeister added a comment. In T236593#8093121 , @C933103 wrote: > As an English example, some religious people might refuse to write the name "God" out directly as it is as this would constitute idolatry. For this we can tag it

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-25 Thread AGutman-WMF
AGutman-WMF added a comment. @Asaf Insofar two forms are considered distinct lexemes, it is probably the case that not all statements hold for both forms (e.g. the pronunciation may be different, and possibly other details such as etymology). If the two forms are close enough (e.g. just

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-22 Thread Asaf
Asaf added a comment. I apologize if I missed something, but if we do end up separating into different *lexemes*, how do we retain the value of all the descriptive work done on one lexeme (presumably the more common or standard form) that equally-well describes the form in the other lexeme?

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-22 Thread AGutman-WMF
AGutman-WMF added a comment. @LucasWerkmeister I agree with you that if two variants have two different pronunciation, they should probably be split into two different lexemes (in general, I think we should avoid having multiple forms with the same grammatical features within one lexeme).

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-20 Thread C933103
C933103 added a comment. In T236593#8092471 , @LucasWerkmeister wrote: > It’s still not clear to me which problem the `-x-Q123-1` patch is trying to solve. Several languages have been mentioned in this task, but which of them would

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-20 Thread LucasWerkmeister
LucasWerkmeister added a comment. It’s still not clear to me which problem the `-x-Q123-1` patch is trying to solve. Several languages have been mentioned in this task, but which of them would benefit from this system? I feel like for several of them, we’ve already reached the conclusion

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-16 Thread Ijon
Ijon added a comment. @AGutman-WMF - yes, I think your approach makes sense. It would be good to auto-suggest those custom language codes in data-entry. TASK DETAIL https://phabricator.wikimedia.org/T236593 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-16 Thread Ijon
Ijon added a comment. @daniel - this would work very well for Hebrew, for example, where the two orthographies have a formal name known to all speakers, but less well when the variations are due to lack of standardization, as in the Bangla case mentioned by @Mahir256. TASK DETAIL

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-07-12 Thread AGutman-WMF
AGutman-WMF added a comment. I believe the current situation, where multiple forms are added to account for spelling variations goes against the spirit of the lexicographical data model, and in particular the idea that there should be exactly one form for each combination of grammatical

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-30 Thread AGutman-WMF
AGutman-WMF added a comment. I've now created a patch that does allow associating several spelling variants with the same private language code. If the patch gets merged, it will allow associating spelling variants of forms or lexemes with codes

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-28 Thread C933103
C933103 added a comment. In T236593#5610378 , @daniel wrote: > I recall that we had long discussions about this when initially deciding on the data model. In technical terms, the question was whether we would allow only a single

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-27 Thread mxn
mxn added a comment. In T236593#8026331 , @mxn wrote: > If it is so important that forms not be used for orthographic variants of a non-alphabetic writing system, then the alternative approach would be to store the //quốc ngữ// and

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-27 Thread Fnielsen
Fnielsen added a comment. @AGutman-WMF Spelling variants in Ordregister are each associated with a specific identifier. If the spelling variants are just a representation, then it is not possible to associate the identifier with the specific representation (unless a new property is

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread mxn
mxn added a comment. In T236593#8025472 , @AGutman-WMF wrote: > @mxn If these are purely orthographic variants (i.e. the pronunciation is the same) I would list them under a single lexeme. And in that case, the most natural way would

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread AGutman-WMF
AGutman-WMF added a comment. @Fnielsen as far as I see, each variant spelling forms its own set of inflected forms, so you have a paradigm related to //mørklægge// and another paradigm related to the variant spelling //mørkelægge//. So conceptually you don't have a single list of forms, but

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread Fnielsen
Fnielsen added a comment. @AGutman-WMF https://www.wikidata.org/wiki/Lexeme:L348129 does have the same inflection. The Ordregister is presumable also for machines and lumps forms together. For instance, https://ordregister.dk/id/COR.53473/ corresponding to

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread AGutman-WMF
AGutman-WMF changed the task status from "Open" to "In Progress". AGutman-WMF added a comment. I'm working on a patch to allow multiple forms associated with the same private language code. TASK DETAIL https://phabricator.wikimedia.org/T236593 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread AGutman-WMF
AGutman-WMF added a comment. @Fnielsen given that the pronunciation of these forms is in fact different (according to the X-Sampa notation), and each has its own distinct inflection set, I would treat these as two distinct (synonymous) lexemes. I don't see the advantage of lumping all these

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread Fnielsen
Fnielsen added a comment. I have entered this Danish lexeme today: https://www.wikidata.org/wiki/Lexeme:L348129. In authoritative works https://ordnet.dk/ddo/ordbog?query=m%C3%B8rkel%C3%A6gge=Den+Danske+Ordbog , https://dsn.dk/ordbog/ro/moerkelaegge/ and https://ordregister.dk/ they are

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread AGutman-WMF
AGutman-WMF added a comment. @mxn If these are purely orthographic variants (i.e. the pronunciation is the same) I would list them under a single lexeme. And in that case, the most natural way would be to list them as spelling variants rather than distinct forms. To attach statements

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread mxn
mxn added a comment. In T236593#8017255 , @mxn wrote: > In T236593#8015993 , @AGutman-WMF wrote: > >> The ideal solution would be to allow (in the language code validator)

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-24 Thread AGutman-WMF
AGutman-WMF added a comment. In T236593#8016636 , @Fnielsen wrote: > In Danish, we are currently using multiple forms and linking them with https://www.wikidata.org/wiki/Property:P8530 See also the discussion at

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-21 Thread mxn
mxn added a comment. In T236593#8015993 , @AGutman-WMF wrote: > The ideal solution would be to allow (in the language code validator) arbitrary language codes including a rank identifier. For instance, for Viatnamese one should be able

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-21 Thread Fnielsen
Fnielsen added a comment. In Danish, we are currently using multiple forms and linking them with https://www.wikidata.org/wiki/Property:P8530 See also the discussion at https://www.wikidata.org/wiki/Wikidata:Property_proposal/Alternative_form TASK DETAIL

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-21 Thread AGutman-WMF
AGutman-WMF added a comment. The ideal solution would be to allow (in the language code validator) arbitrary language codes including a rank identifier. For instance, for Viatnamese one should be able to use codes such as vi-x-Q8201-1, vi-x-Q8201-2 etc. Currently this doesn't pass the

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

2022-06-21 Thread mxn
mxn added a comment. Nearly Vietnamese lexeme would be affected by this issue , because one of the two writing systems for the language is phonetic while the other is phonosemantic, resulting in a