Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

Jonathan Washington Sun, 14 Jun 2020 11:50:07 -0700

Hi all,

I think Tanmai's decision to focus on the part of this proposal that no one
seems to disagree about is the right approach for now.  Opinions about how
to transport other information through the pipeline were getting strong,
and feelings were getting hurt.


That said, I think we should keep this conversation open, and not simply
archive it.  But it should be discussed with less urgency (my apologies for
raising the urgency of the question—I see now that that probably made the
situation worse).  I believe that the community can come to consensus on an
approach, but it may take a while.  One more or less active core developer
or language developer objecting strongly to any given approach is enough to
tell us we have to think a lot harder about things.  It would be great if
we could keep it constructive and respectful, and not give up on the
conversation just because an expressed view isn't being well received.  The
fact that the view is being expressed alone is not enough either, of
course—this question can only be resolved through sustained good-faith
argumentation, and that will take some energy and commitment by invested
parties.

--
Jonathan

14 iyn 2020, B. tarixində 09:08 tarixində Tanmai Khanna <
khanna.tan...@gmail.com> yazdı:

> No there’s no new election happening, least of all over a design choice in
> my project. We’re an open source community with experienced people and it’s
> surprising to me that we’re having such a hard time agreeing upon this.
>
> Francis is one of our most experienced members and the PMC president and
> as such does have an important say in design decisions and major directions
> that Apertium takes. There have been multiple proposals and Francis has
> made clear his objections. If at this point he doesn’t like it then we
> should stop proposing a design again and again. Keep in mind I would also
> say the same for any other experienced member who really objected to a
> design choice.
>
> Francis sees the need for wordbound blanks for markup handling and that
> was a major part of the original proposal so as of now we will go ahead
> with that, and before implementing anything, I will analyse what wikimedia
> needs in terms of markup handling and how we’re planning to sort that out.
>
> We will archive the secondary tags discussion for now and once this is
> done I will make a list of people’s needs for reading bound information and
> we will analyse how best to address those needs.
>
> I think that due to the open nature of this solution we jumped the gun on
> the analysis part and instead of focusing on a problem to solve we started
> building up our solution based on problems we could face in the future.
>
> Point is, all this conflict isn’t really worth it. I’ve always been of the
> opinion that even if one person in the PMC objects to something it
> shouldn’t happen, and I’m sure a group of experts and I can find something
> we all agree on as a design decision.
>
> Peace ✌🏼
>
> Tanmai Khanna
>
> Sent from my iPhone
>
> On 14-Jun-2020, at 18:02, Samuel Sloniker <scoopgra...@gmail.com> wrote:
>
> 
> If we do have a new election, can we vote on the new bylaws first, so we
> use STV?
>
> On Sun, Jun 14, 2020, 04:01 Francis Tyers <fty...@prompsit.com> wrote:
>
>> El 2020-06-14 11:51, Hèctor Alòs i Font escribió:
>> > Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 14 de
>> > juny 2020 a les 10:32:
>> >
>> >> El 2020-06-13 23:18, Jonathan Washington escribió:
>> >>> On Sat, Jun 13, 2020, 16:05 Francis Tyers <fty...@prompsit.com>
>> >> wrote:
>> >>>
>> >>>> El 2020-06-13 19:31, Xavi Ivars escribió:
>> >>>>> Before anything, let me say that I like the proposal to enhance
>> >>>> the
>> >>>>> pipeline with more data (including, but not limited to the
>> >> surface
>> >>>>> forms), to be able to do properly do things that currently we're
>> >>>> doing
>> >>>>> in veeeery hacky (to me) and definitely non-linguistic ways
>> >>>>>
>> >>>>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
>> >>>>>> spa-morph
>> >>>>>> ^El/el<det><def><m><sg>$
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> ^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$
>> >>>>>
>> >>>>> In this example, we "add" semantic information to the pipeline
>> >>>> (and
>> >>>>> disambiguate via CG3) by creating a "fake lemma" needed for
>> >>>> SPA-CAT,
>> >>>>> because "mango<n>" (pan stick) and "mango_fruta<n>" are
>> >> translated
>> >>>>> differently in Catalan. But this, in turn, forces every other
>> >>>> language
>> >>>>> pair using Spanish to know about "mango_fruta<n>" even if the
>> >>>>> translation was the same as "mango<n>".
>> >>>>>
>> >>>>
>> >>>> What is the problem here? That "mango" has two possible lemmas
>> >> and
>> >>>> paradigms
>> >>>> in Spanish?
>> >>>>
>> >>>> The way that I've treated that is to have mango¹ and mango²,
>> >> like
>> >>>> in a
>> >>>> traditional dictionary. I don't think that this requires any
>> >> further
>> >>>
>> >>>> information.
>> >>>
>> >>> I think Xavi's point is that there are a number of ways to
>> >> approach
>> >>> this, and having the option of another stream to put this extra
>> >>> information could be one of them.  Imho, it is nicer in many ways
>> >> than
>> >>> even having (very arbitrary) superscripts (that aren't really any
>> >>> better to have in a morphological analysis than _fruta).
>> >>>
>> >>
>> >> It's following what the lexicographers do:
>> >>
>> >> https://dle.rae.es/?w=mango
>> >>
>> >> So it's following a fairly established practice.
>> >>
>> >> Fran
>> >
>> > As far as I understand the mango's issue, Xavi is contemplating the
>> > possibility of a semantic module which would add extra information
>> > that may be used by other models (especially by the lexical selection
>> > one) to add information about "mango". This could be used for
>> > distinguishing between a handle or a fruit, but in fact not only.
>> > "Mango" can be the fruit and the plant. One could eventually add what
>> > kind of handle it is, e.g. in the RAE dictionary provided by Fran's
>> > the handle of a knife is specifically distinguished among other
>> > handles. As Xavi shows, this extra information could be added so that
>> > it can be ignored by pairs who don't need it. It seems clear that the
>> > solution based on being able to add any additional secondary
>> > information is more versatile, instead of "_fruta", "_2" and the like.
>> >
>> > Moreover, in the lexical selection we have lots of lists like "fruit",
>> > "building", "person", "device", etc. (and if we don't it this because
>> > of a lack of time for writing them). It would be easier if a module
>> > like the one Xavi imagines could add this kind of information and it
>> > could be moved through the pipeline.
>> >
>> > I am not a technician, nor am I a computer linguist. I don't know, nor
>> > do I understand, the implications of Tanmai and Tino's proposals in
>> > terms of system performance. But, from the point of view of someone
>> > with some experience in developing Apertium language pairs, I would
>> > love some tool that would allow adding semantic information to the
>> > pipeline.
>> >
>> > Other kind of contextual information that would also be useful for me
>> > are things like the type of publication (a chat between friends or a
>> > medical encyclopedia?), the dialect, the year of publication, etc. It
>> > would go very well for both lexical selections and, sometimes, for
>> > transfer rules.
>> >
>> > I don't know if this has helped the discussion at all or... si he
>> > pixat completament fora de test.
>> >
>>
>> Thanks for the comments Hèctor. I think that this kind of information
>> could
>> certainly be useful in the pipeline. But I think that determining how it
>> should
>> be added and where it should be added is a separate issue.
>>
>> What would a "semantic tagging" module look like, would it be rule
>> based?
>> statistical? where would the data come from? I could imagine using
>> Wikipedia
>> to extract it.
>>
>> I have no objections to the development of a well-specified and
>> well-designed module
>> for doing semantic tagging.
>>
>> Fran
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

Reply via email to