Hi Hector,
Thanks for all your comments. I really appreciate it! :) I'll try to
respond to the best of my abilities:

When I claimed "The girl ate his apple" is grammatically incoherent, I
meant in the case that this is all of the discourse. You're right that a
pronoun could refer to something in the real world which isn't present in
discourse, but that kind of anaphora resolution is impossible if you have
just text so usually, we just ignore it.

Before I start answering the question, I also want to point out that this
is an endeavour to build a tool that otherwise uses a lot more
linguistically complex knowledge, without that knowledge and to make it
good enough with the available simple linguistic features available. Some
parts of what can be done or can't be done will be found out experimentally
but I added them in my proposal so that we can try and make an informed
decision as to whether something can be language independent or not.

1. Following this thought, let's talk about marking verbs with antecedents.
For dealing with zero pronouns, we *have* *to *mark the verbs with the
antecedents and hence it is something that will be a part of this tool.

You're right in saying that it will be hard to capture the subject of a
verb without any configuration. However, that wasn't what I was trying to
do. *I decided to treat zero pronouns as literally zero pronouns.* Assume a
pronoun exists right before the verb and then perform anaphora resolution
on this zero pronoun. This tool will be language agnostic. If the results
are unsatisfactory, we can funnel down and create language-specific
features to identify the subject :)

2. Identifying antecedents of adjectives (so to speak) will require
separate metrics, but these examples are exactly along the lines of what
I've been thinking, i.e. detecting relative clauses and moving them out of
the way to let the adjective recognise its antecedent. It probably
recognises that for "The lady with the book" because "the book" is part of
a PP which cannot be the subject of "is", similarly I will try to create
relative clause detection to ignore that and connect nice to the nice lady.

3. So "tall" would get the correct adjective if we could do anaphora
resolution for first and second person pronouns but that becomes a lot more
complex than third person pronouns. Correct me if I'm wrong, but first and
second person pronouns are usually resolved in the real world, and not very
often said first in context. If you ask me I would leave those out for now.
But you're right, it is interesting to think about how to deal with them.
Maybe in cases where the person introduces themselves first, we should be
able to attach it to "I" in "I am".

4. I was told that Anaphora is needed in Catalan as well, and if we use the
same module for both we still have to test how it performs on both. But as
mentioned in the proposal, I'll try to make the anaphora tool as language
agnostic as possible and will test it with multiple pairs to see the
result. If you have any pair suggestions right now that need it I can add
them.

5. I'm using Apertium Simpleton UI for MacOS and for "La chica está aquí,
lleva un vestido rojo.", I get "The girl is here, spends a red dress"
(Attaching Screenshot makes email too big to send so just take my word for
it :P ). Not sure why

Thanks for all your questions and suggestions, they'll definitely help me
build a better tool. I really hope I was able to answer your questions
satisfactorily. If not, I apologise and I wouldn't mind a follow up. It
will certainly help me even more. :)

On Sat, Mar 30, 2019 at 12:54 AM Hèctor Alòs i Font <hectora...@gmail.com>
wrote:

> Hi Tanmai,
>
> I won't be a mentor, but I asked for anaphora resolution in Apertium, so,
> if I am allowed, I'd like some clarification about the proposal (which, I
> think, is great - congrats).
>
> First of all, note that "The girl ate his apple" is not grammatically
> incoherent. Maybe she ate an apple given by a male friend of her. Anaphora
> resolution is complicated i.a. because language is often ambiguous.
>
> 1. I've been thinking about the example
>
> La chica comió su manzana
>
> Let's suppose that the antecedent of "su" is "la chica".
> If the target language would be a Slavic language or Esperanto, the
> selection will not be between "his", "her" or "its", but also a reflexive
> possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but
> not Девушка съела её яблоко. If using the proposal in
> http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could
> we deal with it. We probably should need to have a referent in the verb
> too, in order to be able to compare in the transfer rules whether the
> antecedent of "su" is also the antecedent of "comió".
>
> So, my point is: will the user be able to "configure" for which parts of
> speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair
> I don't see any need to track the "antecedents" of verbs, but for e.g.
> Spanish to English it seems necessary for dealing with zero pronouns.
>
> (By the way, I am surprised that e.g. the subject of a verb can be tracked
> by a language-independent tool without any configuration. I really doubt
> this can be true.)
>
> 2. The examples in "Reflexive pronouns" and "Long distance agreement" seem
> very difficult. I'd propose a few simpler agreements:
> * The lady with the book is nice.
> * The lady reading the book is nice.
> * The lady who reads the book is nice.
> "Nice" should be feminine in Spanish/Catalan (currently it happens only in
> the first case)
> * The singers that sing sing well.
> Both "sing" should be p3pl in Spanish/Catalan, currently they are not
> ("Los cantantes que canta canta bien").
>
> 3. Let's accept that we will deal only with the 3rd person. It is too
> complicated to resolve:
> * I'm tall
> (gender?)
> * You are tall
> (gender? number?)
>
> 4. I cannot see why it should be useful to test the system with the
> Spanish-English and Catalan-English pairs. As for the anaphora, if I am not
> wrong, Catalan and Spanish are twins. One pair of the two seems enough.
>
> 5. One detail: the current translation of
> La chica está aquí, lleva un vestido rojo.
> is:
> The girl is here, carries a red dress.
>
> Best,
> Hèctor
>
> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dv., 29 de
> març 2019 a les 15:48:
>
>> Hi,
>> I have submitted a draft for review for the project "Anaphora Resolution"
>> for GSoC 2019. The project will also include a tool for resolution of
>> agreement for adjectives in Spanish, Catalan and other languages that need
>> it.
>>
>> You can find the proposal here:
>> http://wiki.apertium.org/wiki/User:Khannatanmai
>>
>> If anyone has any comments, suggestions, criticism, ideas, I would
>> really appreciate if you let me know as it'll help me make a stronger
>> proposal and a better tool for Apertium during GSoC 2019.
>>
>> Thanks and Regards,
>> Tanmai Khanna
>> IRC: khannatanmai
>>
>> --
>> *Khanna, Tanmai*
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to