Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-31 Thread Tanmai Khanna
Hey Hector,
Thanks again for all the comments!
I would also recommend the mentors to go through this email as it explores
an important part of Anaphora Resolution, i.e. Zero Anaphora Resolution.

Also, in a language like Spanish there are quite a lot of time
> constructions like
> El lunes irá al médico
> (word by word translation: Monday will-go to-the doctor)
> It is very likely that "lunes" will be chosen as the subject of "irá".
> (The same for dates e.g.: El 3 de abril irá al médico = The 3 of April
> will-go to-the doctor.)
>

I was thinking about cases like this and wondering if I can make an
animate-inanimate-human distinction for nouns or even just
animate-inanimate so that I can eliminate a lot of options easily. POS tags
and Gender info don't say much about this so still exploring this option,
but it could be incorporated in the impeding parameters, which is a list
that will be appended to.


> Assuming that a pronoun exists right before the verb is highly language
> specific. This works, as a rule, for SVO languages, like English, Spanish
> and Catalan, but will not work for SOV languages, like (typically) Turkic
> and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a.
> Arabic and Celtic languages. As we have quite a lot of non-SVO languages in
> Apertium, searching a subject right before the verb seems a bad guess.
>
> Furthermore, even for a SVO language like Spanish, there are several quite
> often verbs for whom the subject in located after the verb, e.g.:
> Me faltan libros
> Me gustan los plátanos
> Me duelen las muelas
> etc.
> Or in SVO languages like Russian or Esperanto, if is not rare to place the
> subject after the verb, since the case tells us what is the subject.
>
> So, I think the system should deal with different language typologies, and
> probably would need some configuration to deal with "special verbs" in a
> specific language, like "faltar", "gustar", "doler" given in the Spanish
> examples. Of course, you can try which are the results in the EU corpus
> with the system you propose, but it don't think there will be a good
> percentage of success in German, Finnish and Hungarian, and, I guess, they
> will be worse in Slavic languages than in Romance and, of course, English.
>

Hmmm, I see your point. I guess for zero pronouns it becomes more of a
Subject-Identification or a Semantic-Role-Labelling problem than an
Anaphora Resolution problem. It's true that the assuming zero anaphora
wouldn't work for non SVO languages, and it's funny I missed this
considering Hindi is my mother tongue.

I think it does make sense to make a system to deal with different language
typologies, at least for zero pronouns. It isn't technically Anaphora
Resolution so I'm a little confused about the scope, but the problem I've
to deal with is choosing the correct pronoun so no point in restricting to
traditional Anaphora Resolution.

I explored the old and new methods to do Zero Anaphora Resolution and as
expected, the older methods were heuristic based and the new methods are
corpus based.

"The same trend is observed also in Japanese zero- anaphora resolution,
where the findings made in rule-based or theory-oriented work (Kameyama,
1986; Nakaiwa and Shirai, 1996; Okumura and Tamura, 1996, etc.) have been
successfully incorporated in machine learning-based frame- works (Seki et
al., 2002; Iida et al., 2003)."
Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution


I explored the different methods that have been used, old and new. The
newer ones are mostly ML based methods which require a large amount of
data, however, even the older heuristic based methods require a large
amount of Linguistic Information, such as syntactic trees, semantic
knowledge, etc. Here are some examples:

1. An Empirical Study of Zero Anaphora Resolution in Chinese Based on
Centering Model 

The Centering Model, a completely rule-based model and a very popular model
to perform Zero Anaphora Resolution, takes a syntax structure and its
semantic interpretation as input.

"The task of zero and nominal anaphora resolution is performed after the
semantic interpretation phase that converts the syntactic structure of a
sentence into a semantic representation form such as the logic form"

2.  Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution


Explores a hybrid mechanism but still uses ML to train models.

3. Zero Anaphora Resolution in Chinese with Shallow Parsing

Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Hèctor Alòs i Font
Hi Tanmai,

I add some comments between paragraphs (especially on zero pronouns).

Missatge de Tanmai Khanna  del dia ds., 30 de març
2019 a les 1:11:

> Hi Hector,
> Thanks for all your comments. I really appreciate it! :) I'll try to
> respond to the best of my abilities:
>
> When I claimed "The girl ate his apple" is grammatically incoherent, I
> meant in the case that this is all of the discourse. You're right that a
> pronoun could refer to something in the real world which isn't present in
> discourse, but that kind of anaphora resolution is impossible if you have
> just text so usually, we just ignore it.
>
> Before I start answering the question, I also want to point out that this
> is an endeavour to build a tool that otherwise uses a lot more
> linguistically complex knowledge, without that knowledge and to make it
> good enough with the available simple linguistic features available. Some
> parts of what can be done or can't be done will be found out experimentally
> but I added them in my proposal so that we can try and make an informed
> decision as to whether something can be language independent or not.
>
> 1. Following this thought, let's talk about marking verbs with
> antecedents. For dealing with zero pronouns, we *have* *to *mark the
> verbs with the antecedents and hence it is something that will be a part of
> this tool.
>
> You're right in saying that it will be hard to capture the subject of a
> verb without any configuration. However, that wasn't what I was trying to
> do. *I decided to treat zero pronouns as literally zero pronouns.* Assume
> a pronoun exists right before the verb and then perform anaphora resolution
> on this zero pronoun. This tool will be language agnostic. If the results
> are unsatisfactory, we can funnel down and create language-specific
> features to identify the subject :)
>

Assuming that a pronoun exists right before the verb is highly language
specific. This works, as a rule, for SVO languages, like English, Spanish
and Catalan, but will not work for SOV languages, like (typically) Turkic
and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a.
Arabic and Celtic languages. As we have quite a lot of non-SVO languages in
Apertium, searching a subject right before the verb seems a bad guess.

Furthermore, even for a SVO language like Spanish, there are several quite
often verbs for whom the subject in located after the verb, e.g.:
Me faltan libros
Me gustan los plátanos
Me duelen las muelas
etc.
Or in SVO languages like Russian or Esperanto, if is not rare to place the
subject after the verb, since the case tells us what is the subject.

Also, in a language like Spanish there are quite a lot of time
constructions like
El lunes irá al médico
(word by word translation: Monday will-go to-the doctor)
It is very likely that "lunes" will be chosen as the subject of "irá".
(The same for dates e.g.: El 3 de abril irá al médico = The 3 of April
will-go to-the doctor.)

So, I think the system should deal with different language typologies, and
probably would need some configuration to deal with "special verbs" in a
specific language, like "faltar", "gustar", "doler" given in the Spanish
examples. Of course, you can try which are the results in the EU corpus
with the system you propose, but it don't think there will be a good
percentage of success in German, Finnish and Hungarian, and, I guess, they
will be worse in Slavic languages than in Romance and, of course, English.

2. Identifying antecedents of adjectives (so to speak) will require
> separate metrics, but these examples are exactly along the lines of what
> I've been thinking, i.e. detecting relative clauses and moving them out of
> the way to let the adjective recognise its antecedent. It probably
> recognises that for "The lady with the book" because "the book" is part
> of a PP which cannot be the subject of "is", similarly I will try to create
> relative clause detection to ignore that and connect nice to the nice lady.
>
> 3. So "tall" would get the correct adjective if we could do anaphora
> resolution for first and second person pronouns but that becomes a lot more
> complex than third person pronouns. Correct me if I'm wrong, but first and
> second person pronouns are usually resolved in the real world, and not very
> often said first in context. If you ask me I would leave those out for now.
> But you're right, it is interesting to think about how to deal with them.
> Maybe in cases where the person introduces themselves first, we should be
> able to attach it to "I" in "I am".
>

Yes, the problem is the one you say. It is generally impossible in, for
example, an English text to know whether "I" or "you" are male of female,
or "we" is inclusive or exclusive, etc. That's why I thing it's better to
forget about 1st and 2nd persons (imho).


> 4. I was told that Anaphora is needed in Catalan as well, and if we use
> the same module for both we still have to test how it performs on 

Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Tanmai Khanna
Hi Hector,
Thanks for all your comments. I really appreciate it! :) I'll try to
respond to the best of my abilities:

When I claimed "The girl ate his apple" is grammatically incoherent, I
meant in the case that this is all of the discourse. You're right that a
pronoun could refer to something in the real world which isn't present in
discourse, but that kind of anaphora resolution is impossible if you have
just text so usually, we just ignore it.

Before I start answering the question, I also want to point out that this
is an endeavour to build a tool that otherwise uses a lot more
linguistically complex knowledge, without that knowledge and to make it
good enough with the available simple linguistic features available. Some
parts of what can be done or can't be done will be found out experimentally
but I added them in my proposal so that we can try and make an informed
decision as to whether something can be language independent or not.

1. Following this thought, let's talk about marking verbs with antecedents.
For dealing with zero pronouns, we *have* *to *mark the verbs with the
antecedents and hence it is something that will be a part of this tool.

You're right in saying that it will be hard to capture the subject of a
verb without any configuration. However, that wasn't what I was trying to
do. *I decided to treat zero pronouns as literally zero pronouns.* Assume a
pronoun exists right before the verb and then perform anaphora resolution
on this zero pronoun. This tool will be language agnostic. If the results
are unsatisfactory, we can funnel down and create language-specific
features to identify the subject :)

2. Identifying antecedents of adjectives (so to speak) will require
separate metrics, but these examples are exactly along the lines of what
I've been thinking, i.e. detecting relative clauses and moving them out of
the way to let the adjective recognise its antecedent. It probably
recognises that for "The lady with the book" because "the book" is part of
a PP which cannot be the subject of "is", similarly I will try to create
relative clause detection to ignore that and connect nice to the nice lady.

3. So "tall" would get the correct adjective if we could do anaphora
resolution for first and second person pronouns but that becomes a lot more
complex than third person pronouns. Correct me if I'm wrong, but first and
second person pronouns are usually resolved in the real world, and not very
often said first in context. If you ask me I would leave those out for now.
But you're right, it is interesting to think about how to deal with them.
Maybe in cases where the person introduces themselves first, we should be
able to attach it to "I" in "I am".

4. I was told that Anaphora is needed in Catalan as well, and if we use the
same module for both we still have to test how it performs on both. But as
mentioned in the proposal, I'll try to make the anaphora tool as language
agnostic as possible and will test it with multiple pairs to see the
result. If you have any pair suggestions right now that need it I can add
them.

5. I'm using Apertium Simpleton UI for MacOS and for "La chica está aquí,
lleva un vestido rojo.", I get "The girl is here, spends a red dress"
(Attaching Screenshot makes email too big to send so just take my word for
it :P ). Not sure why

Thanks for all your questions and suggestions, they'll definitely help me
build a better tool. I really hope I was able to answer your questions
satisfactorily. If not, I apologise and I wouldn't mind a follow up. It
will certainly help me even more. :)

On Sat, Mar 30, 2019 at 12:54 AM Hèctor Alòs i Font 
wrote:

> Hi Tanmai,
>
> I won't be a mentor, but I asked for anaphora resolution in Apertium, so,
> if I am allowed, I'd like some clarification about the proposal (which, I
> think, is great - congrats).
>
> First of all, note that "The girl ate his apple" is not grammatically
> incoherent. Maybe she ate an apple given by a male friend of her. Anaphora
> resolution is complicated i.a. because language is often ambiguous.
>
> 1. I've been thinking about the example
>
> La chica comió su manzana
>
> Let's suppose that the antecedent of "su" is "la chica".
> If the target language would be a Slavic language or Esperanto, the
> selection will not be between "his", "her" or "its", but also a reflexive
> possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but
> not Девушка съела её яблоко. If using the proposal in
> http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could
> we deal with it. We probably should need to have a referent in the verb
> too, in order to be able to compare in the transfer rules whether the
> antecedent of "su" is also the antecedent of "comió".
>
> So, my point is: will the user be able to "configure" for which parts of
> speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair
> I don't see any need to track the "antecedents" of verbs, but for e.g.
> Spanish to English it 

Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Hèctor Alòs i Font
Hi Tanmai,

I won't be a mentor, but I asked for anaphora resolution in Apertium, so,
if I am allowed, I'd like some clarification about the proposal (which, I
think, is great - congrats).

First of all, note that "The girl ate his apple" is not grammatically
incoherent. Maybe she ate an apple given by a male friend of her. Anaphora
resolution is complicated i.a. because language is often ambiguous.

1. I've been thinking about the example

La chica comió su manzana

Let's suppose that the antecedent of "su" is "la chica".
If the target language would be a Slavic language or Esperanto, the
selection will not be between "his", "her" or "its", but also a reflexive
possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but
not Девушка съела её яблоко. If using the proposal in
http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could we
deal with it. We probably should need to have a referent in the verb too,
in order to be able to compare in the transfer rules whether the antecedent
of "su" is also the antecedent of "comió".

So, my point is: will the user be able to "configure" for which parts of
speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair
I don't see any need to track the "antecedents" of verbs, but for e.g.
Spanish to English it seems necessary for dealing with zero pronouns.

(By the way, I am surprised that e.g. the subject of a verb can be tracked
by a language-independent tool without any configuration. I really doubt
this can be true.)

2. The examples in "Reflexive pronouns" and "Long distance agreement" seem
very difficult. I'd propose a few simpler agreements:
* The lady with the book is nice.
* The lady reading the book is nice.
* The lady who reads the book is nice.
"Nice" should be feminine in Spanish/Catalan (currently it happens only in
the first case)
* The singers that sing sing well.
Both "sing" should be p3pl in Spanish/Catalan, currently they are not ("Los
cantantes que canta canta bien").

3. Let's accept that we will deal only with the 3rd person. It is too
complicated to resolve:
* I'm tall
(gender?)
* You are tall
(gender? number?)

4. I cannot see why it should be useful to test the system with the
Spanish-English and Catalan-English pairs. As for the anaphora, if I am not
wrong, Catalan and Spanish are twins. One pair of the two seems enough.

5. One detail: the current translation of
La chica está aquí, lleva un vestido rojo.
is:
The girl is here, carries a red dress.

Best,
Hèctor

Missatge de Tanmai Khanna  del dia dv., 29 de març
2019 a les 15:48:

> Hi,
> I have submitted a draft for review for the project "Anaphora Resolution"
> for GSoC 2019. The project will also include a tool for resolution of
> agreement for adjectives in Spanish, Catalan and other languages that need
> it.
>
> You can find the proposal here:
> http://wiki.apertium.org/wiki/User:Khannatanmai
>
> If anyone has any comments, suggestions, criticism, ideas, I would
> really appreciate if you let me know as it'll help me make a stronger
> proposal and a better tool for Apertium during GSoC 2019.
>
> Thanks and Regards,
> Tanmai Khanna
> IRC: khannatanmai
>
> --
> *Khanna, Tanmai*
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Tanmai Khanna
Hi,
I have submitted a draft for review for the project "Anaphora Resolution"
for GSoC 2019. The project will also include a tool for resolution of
agreement for adjectives in Spanish, Catalan and other languages that need
it.

You can find the proposal here:
http://wiki.apertium.org/wiki/User:Khannatanmai

If anyone has any comments, suggestions, criticism, ideas, I would
really appreciate if you let me know as it'll help me make a stronger
proposal and a better tool for Apertium during GSoC 2019.

Thanks and Regards,
Tanmai Khanna
IRC: khannatanmai

-- 
*Khanna, Tanmai*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff