Re: [Apertium-stuff] beta.apertium.org

2019-03-29 Thread Sushain Cherivirala
Hi Hèctor,

It should be using whatever is currently on GitHub. The beta site doesn't
use any released pairs, it directly downloads and compiles everything from
source.

So, I think it should be using apertium-cat-ita since that's what I see in
https://github.com/apertium/apertium-cat-ita/blob/master/modes.xml.

[image: Sushain Cherivirala]
*Sushain K. Cherivirala *
Stanford University, M.S. in Computer Science '19
Carnegie Mellon University, B.S. in Computer Science '18
(713) 992-4043 | www.skc.name


On Sat, Mar 30, 2019 at 1:41 AM Hèctor Alòs i Font 
wrote:

> Hi Sushain,
>
> I have a question. For e.g. the Catalan-Italian and the Catalan-Portuguese
> language pairs there are the two-letters released versions and the
> thee-letters unreleased ones (apertium-ca-it vs apertium-cat-ita,
> apertium-pt-ca vs apertium-por-cat). Which ones beta.apertium uses in such
> cases?
>
> Best,
> Hèctor
>
> Missatge de Sushain Cherivirala  del dia dv., 29 de
> març 2019 a les 21:45:
>
>> Hi Fran (and others),
>>
>> This is a pretty late update but as of earlier today, the beta site
>> should be updating nightly again!
>>
>> All the old SVN checkouts were cleared out and replaced with Git clones.
>> Special thanks to
>> unhammer for fixing up apertium-get (and quickly resolving the new issues
>> I made yesterday)!
>>
>> I also misspoke earlier re. how APy updates are deployed. Html-tools must
>> be updated via SSH.
>> However, the nightly update script pulls down the APy docker image from
>> Docker Hub. FWIW,
>> the Docker build had been broken by a switch to pipenv until a couple
>> days ago but it should
>> work fine now. The upshot is, updates to things like language names in
>> APy master should be
>> reflected on beta.apertium.org within 24 hours.
>>
>> [image: Sushain Cherivirala]
>> *Sushain K. Cherivirala *
>> Stanford University, M.S. in Computer Science '19
>> Carnegie Mellon University, B.S. in Computer Science '18
>> (713) 992-4043 | www.skc.name
>>
>>
>> On Sun, Jul 15, 2018 at 9:51 PM Sushain Cherivirala 
>> wrote:
>>
>>> Fran,
>>>
>>> In theory, the pairs are updated nightly automatically:
>>>
>>> @daily cd /home/apertium/beta/apertium-html-tools &&
>>> ./tools/docker/deploy-all-pairs.sh
>>>
>>> However, due to the following issue, that script hasn't done much for a
>>> while:
>>>
>>> https://github.com/apertium/apertium-get/issues/7
>>>
>>> Updates to the APy/Html-tools code are manual via SSH into the projectjj
>>> machine. Continuous
>>> deployment could be set-up but it doesn't seem like it would add much
>>> value.
>>>
>>>
>>> [image: Sushain Cherivirala]
>>> *Sushain K. Cherivirala *
>>> Software Engineer Intern, Stripe
>>> Stanford University, M.S. in Computer Science '19
>>> Carnegie Mellon University, B.S. in Computer Science '18
>>> (713) 992-4043  <713-992-4043>| www.skc.name
>>>
>>> On Sun, Jul 15, 2018 at 3:32 AM, Francis Tyers 
>>> wrote:
>>>
 Does anyone know how this works ? It would be good to have
 documentation on the Wiki

 http://wiki.apertium.org/wiki/Beta

 * How often are pairs there updated? Is it nightly?
 * Who can update the pairs there?
 * What needs to be done in order to update them?

 If someone knows, please reply here and I'll summarise the wisdom on
 the Wiki.

 Thanks!

 Fran


 --
 Check out the vibrant tech community on one of the world's most
 engaging tech sites, Slashdot.org! http://sdm.link/slashdot
 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

>>>
>>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] beta.apertium.org

2019-03-29 Thread Hèctor Alòs i Font
Hi Sushain,

I have a question. For e.g. the Catalan-Italian and the Catalan-Portuguese
language pairs there are the two-letters released versions and the
thee-letters unreleased ones (apertium-ca-it vs apertium-cat-ita,
apertium-pt-ca vs apertium-por-cat). Which ones beta.apertium uses in such
cases?

Best,
Hèctor

Missatge de Sushain Cherivirala  del dia dv., 29 de març
2019 a les 21:45:

> Hi Fran (and others),
>
> This is a pretty late update but as of earlier today, the beta site should
> be updating nightly again!
>
> All the old SVN checkouts were cleared out and replaced with Git clones.
> Special thanks to
> unhammer for fixing up apertium-get (and quickly resolving the new issues
> I made yesterday)!
>
> I also misspoke earlier re. how APy updates are deployed. Html-tools must
> be updated via SSH.
> However, the nightly update script pulls down the APy docker image from
> Docker Hub. FWIW,
> the Docker build had been broken by a switch to pipenv until a couple days
> ago but it should
> work fine now. The upshot is, updates to things like language names in APy
> master should be
> reflected on beta.apertium.org within 24 hours.
>
> [image: Sushain Cherivirala]
> *Sushain K. Cherivirala *
> Stanford University, M.S. in Computer Science '19
> Carnegie Mellon University, B.S. in Computer Science '18
> (713) 992-4043 | www.skc.name
>
>
> On Sun, Jul 15, 2018 at 9:51 PM Sushain Cherivirala 
> wrote:
>
>> Fran,
>>
>> In theory, the pairs are updated nightly automatically:
>>
>> @daily cd /home/apertium/beta/apertium-html-tools &&
>> ./tools/docker/deploy-all-pairs.sh
>>
>> However, due to the following issue, that script hasn't done much for a
>> while:
>>
>> https://github.com/apertium/apertium-get/issues/7
>>
>> Updates to the APy/Html-tools code are manual via SSH into the projectjj
>> machine. Continuous
>> deployment could be set-up but it doesn't seem like it would add much
>> value.
>>
>>
>> [image: Sushain Cherivirala]
>> *Sushain K. Cherivirala *
>> Software Engineer Intern, Stripe
>> Stanford University, M.S. in Computer Science '19
>> Carnegie Mellon University, B.S. in Computer Science '18
>> (713) 992-4043  <713-992-4043>| www.skc.name
>>
>> On Sun, Jul 15, 2018 at 3:32 AM, Francis Tyers 
>> wrote:
>>
>>> Does anyone know how this works ? It would be good to have documentation
>>> on the Wiki
>>>
>>> http://wiki.apertium.org/wiki/Beta
>>>
>>> * How often are pairs there updated? Is it nightly?
>>> * Who can update the pairs there?
>>> * What needs to be done in order to update them?
>>>
>>> If someone knows, please reply here and I'll summarise the wisdom on the
>>> Wiki.
>>>
>>> Thanks!
>>>
>>> Fran
>>>
>>>
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
>> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Hèctor Alòs i Font
Hi Tanmai,

I add some comments between paragraphs (especially on zero pronouns).

Missatge de Tanmai Khanna  del dia ds., 30 de març
2019 a les 1:11:

> Hi Hector,
> Thanks for all your comments. I really appreciate it! :) I'll try to
> respond to the best of my abilities:
>
> When I claimed "The girl ate his apple" is grammatically incoherent, I
> meant in the case that this is all of the discourse. You're right that a
> pronoun could refer to something in the real world which isn't present in
> discourse, but that kind of anaphora resolution is impossible if you have
> just text so usually, we just ignore it.
>
> Before I start answering the question, I also want to point out that this
> is an endeavour to build a tool that otherwise uses a lot more
> linguistically complex knowledge, without that knowledge and to make it
> good enough with the available simple linguistic features available. Some
> parts of what can be done or can't be done will be found out experimentally
> but I added them in my proposal so that we can try and make an informed
> decision as to whether something can be language independent or not.
>
> 1. Following this thought, let's talk about marking verbs with
> antecedents. For dealing with zero pronouns, we *have* *to *mark the
> verbs with the antecedents and hence it is something that will be a part of
> this tool.
>
> You're right in saying that it will be hard to capture the subject of a
> verb without any configuration. However, that wasn't what I was trying to
> do. *I decided to treat zero pronouns as literally zero pronouns.* Assume
> a pronoun exists right before the verb and then perform anaphora resolution
> on this zero pronoun. This tool will be language agnostic. If the results
> are unsatisfactory, we can funnel down and create language-specific
> features to identify the subject :)
>

Assuming that a pronoun exists right before the verb is highly language
specific. This works, as a rule, for SVO languages, like English, Spanish
and Catalan, but will not work for SOV languages, like (typically) Turkic
and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a.
Arabic and Celtic languages. As we have quite a lot of non-SVO languages in
Apertium, searching a subject right before the verb seems a bad guess.

Furthermore, even for a SVO language like Spanish, there are several quite
often verbs for whom the subject in located after the verb, e.g.:
Me faltan libros
Me gustan los plátanos
Me duelen las muelas
etc.
Or in SVO languages like Russian or Esperanto, if is not rare to place the
subject after the verb, since the case tells us what is the subject.

Also, in a language like Spanish there are quite a lot of time
constructions like
El lunes irá al médico
(word by word translation: Monday will-go to-the doctor)
It is very likely that "lunes" will be chosen as the subject of "irá".
(The same for dates e.g.: El 3 de abril irá al médico = The 3 of April
will-go to-the doctor.)

So, I think the system should deal with different language typologies, and
probably would need some configuration to deal with "special verbs" in a
specific language, like "faltar", "gustar", "doler" given in the Spanish
examples. Of course, you can try which are the results in the EU corpus
with the system you propose, but it don't think there will be a good
percentage of success in German, Finnish and Hungarian, and, I guess, they
will be worse in Slavic languages than in Romance and, of course, English.

2. Identifying antecedents of adjectives (so to speak) will require
> separate metrics, but these examples are exactly along the lines of what
> I've been thinking, i.e. detecting relative clauses and moving them out of
> the way to let the adjective recognise its antecedent. It probably
> recognises that for "The lady with the book" because "the book" is part
> of a PP which cannot be the subject of "is", similarly I will try to create
> relative clause detection to ignore that and connect nice to the nice lady.
>
> 3. So "tall" would get the correct adjective if we could do anaphora
> resolution for first and second person pronouns but that becomes a lot more
> complex than third person pronouns. Correct me if I'm wrong, but first and
> second person pronouns are usually resolved in the real world, and not very
> often said first in context. If you ask me I would leave those out for now.
> But you're right, it is interesting to think about how to deal with them.
> Maybe in cases where the person introduces themselves first, we should be
> able to attach it to "I" in "I am".
>

Yes, the problem is the one you say. It is generally impossible in, for
example, an English text to know whether "I" or "you" are male of female,
or "we" is inclusive or exclusive, etc. That's why I thing it's better to
forget about 1st and 2nd persons (imho).


> 4. I was told that Anaphora is needed in Catalan as well, and if we use
> the same module for both we still have to test how it performs on both

Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Tanmai Khanna
Hi Hector,
Thanks for all your comments. I really appreciate it! :) I'll try to
respond to the best of my abilities:

When I claimed "The girl ate his apple" is grammatically incoherent, I
meant in the case that this is all of the discourse. You're right that a
pronoun could refer to something in the real world which isn't present in
discourse, but that kind of anaphora resolution is impossible if you have
just text so usually, we just ignore it.

Before I start answering the question, I also want to point out that this
is an endeavour to build a tool that otherwise uses a lot more
linguistically complex knowledge, without that knowledge and to make it
good enough with the available simple linguistic features available. Some
parts of what can be done or can't be done will be found out experimentally
but I added them in my proposal so that we can try and make an informed
decision as to whether something can be language independent or not.

1. Following this thought, let's talk about marking verbs with antecedents.
For dealing with zero pronouns, we *have* *to *mark the verbs with the
antecedents and hence it is something that will be a part of this tool.

You're right in saying that it will be hard to capture the subject of a
verb without any configuration. However, that wasn't what I was trying to
do. *I decided to treat zero pronouns as literally zero pronouns.* Assume a
pronoun exists right before the verb and then perform anaphora resolution
on this zero pronoun. This tool will be language agnostic. If the results
are unsatisfactory, we can funnel down and create language-specific
features to identify the subject :)

2. Identifying antecedents of adjectives (so to speak) will require
separate metrics, but these examples are exactly along the lines of what
I've been thinking, i.e. detecting relative clauses and moving them out of
the way to let the adjective recognise its antecedent. It probably
recognises that for "The lady with the book" because "the book" is part of
a PP which cannot be the subject of "is", similarly I will try to create
relative clause detection to ignore that and connect nice to the nice lady.

3. So "tall" would get the correct adjective if we could do anaphora
resolution for first and second person pronouns but that becomes a lot more
complex than third person pronouns. Correct me if I'm wrong, but first and
second person pronouns are usually resolved in the real world, and not very
often said first in context. If you ask me I would leave those out for now.
But you're right, it is interesting to think about how to deal with them.
Maybe in cases where the person introduces themselves first, we should be
able to attach it to "I" in "I am".

4. I was told that Anaphora is needed in Catalan as well, and if we use the
same module for both we still have to test how it performs on both. But as
mentioned in the proposal, I'll try to make the anaphora tool as language
agnostic as possible and will test it with multiple pairs to see the
result. If you have any pair suggestions right now that need it I can add
them.

5. I'm using Apertium Simpleton UI for MacOS and for "La chica está aquí,
lleva un vestido rojo.", I get "The girl is here, spends a red dress"
(Attaching Screenshot makes email too big to send so just take my word for
it :P ). Not sure why

Thanks for all your questions and suggestions, they'll definitely help me
build a better tool. I really hope I was able to answer your questions
satisfactorily. If not, I apologise and I wouldn't mind a follow up. It
will certainly help me even more. :)

On Sat, Mar 30, 2019 at 12:54 AM Hèctor Alòs i Font 
wrote:

> Hi Tanmai,
>
> I won't be a mentor, but I asked for anaphora resolution in Apertium, so,
> if I am allowed, I'd like some clarification about the proposal (which, I
> think, is great - congrats).
>
> First of all, note that "The girl ate his apple" is not grammatically
> incoherent. Maybe she ate an apple given by a male friend of her. Anaphora
> resolution is complicated i.a. because language is often ambiguous.
>
> 1. I've been thinking about the example
>
> La chica comió su manzana
>
> Let's suppose that the antecedent of "su" is "la chica".
> If the target language would be a Slavic language or Esperanto, the
> selection will not be between "his", "her" or "its", but also a reflexive
> possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but
> not Девушка съела её яблоко. If using the proposal in
> http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could
> we deal with it. We probably should need to have a referent in the verb
> too, in order to be able to compare in the transfer rules whether the
> antecedent of "su" is also the antecedent of "comió".
>
> So, my point is: will the user be able to "configure" for which parts of
> speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair
> I don't see any need to track the "antecedents" of verbs, but for e.g.
> Spanish to English it 

Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Hèctor Alòs i Font
Hi Tanmai,

I won't be a mentor, but I asked for anaphora resolution in Apertium, so,
if I am allowed, I'd like some clarification about the proposal (which, I
think, is great - congrats).

First of all, note that "The girl ate his apple" is not grammatically
incoherent. Maybe she ate an apple given by a male friend of her. Anaphora
resolution is complicated i.a. because language is often ambiguous.

1. I've been thinking about the example

La chica comió su manzana

Let's suppose that the antecedent of "su" is "la chica".
If the target language would be a Slavic language or Esperanto, the
selection will not be between "his", "her" or "its", but also a reflexive
possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but
not Девушка съела её яблоко. If using the proposal in
http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could we
deal with it. We probably should need to have a referent in the verb too,
in order to be able to compare in the transfer rules whether the antecedent
of "su" is also the antecedent of "comió".

So, my point is: will the user be able to "configure" for which parts of
speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair
I don't see any need to track the "antecedents" of verbs, but for e.g.
Spanish to English it seems necessary for dealing with zero pronouns.

(By the way, I am surprised that e.g. the subject of a verb can be tracked
by a language-independent tool without any configuration. I really doubt
this can be true.)

2. The examples in "Reflexive pronouns" and "Long distance agreement" seem
very difficult. I'd propose a few simpler agreements:
* The lady with the book is nice.
* The lady reading the book is nice.
* The lady who reads the book is nice.
"Nice" should be feminine in Spanish/Catalan (currently it happens only in
the first case)
* The singers that sing sing well.
Both "sing" should be p3pl in Spanish/Catalan, currently they are not ("Los
cantantes que canta canta bien").

3. Let's accept that we will deal only with the 3rd person. It is too
complicated to resolve:
* I'm tall
(gender?)
* You are tall
(gender? number?)

4. I cannot see why it should be useful to test the system with the
Spanish-English and Catalan-English pairs. As for the anaphora, if I am not
wrong, Catalan and Spanish are twins. One pair of the two seems enough.

5. One detail: the current translation of
La chica está aquí, lleva un vestido rojo.
is:
The girl is here, carries a red dress.

Best,
Hèctor

Missatge de Tanmai Khanna  del dia dv., 29 de març
2019 a les 15:48:

> Hi,
> I have submitted a draft for review for the project "Anaphora Resolution"
> for GSoC 2019. The project will also include a tool for resolution of
> agreement for adjectives in Spanish, Catalan and other languages that need
> it.
>
> You can find the proposal here:
> http://wiki.apertium.org/wiki/User:Khannatanmai
>
> If anyone has any comments, suggestions, criticism, ideas, I would
> really appreciate if you let me know as it'll help me make a stronger
> proposal and a better tool for Apertium during GSoC 2019.
>
> Thanks and Regards,
> Tanmai Khanna
> IRC: khannatanmai
>
> --
> *Khanna, Tanmai*
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] beta.apertium.org

2019-03-29 Thread Sushain Cherivirala
Hi Fran (and others),

This is a pretty late update but as of earlier today, the beta site should
be updating nightly again!

All the old SVN checkouts were cleared out and replaced with Git clones.
Special thanks to
unhammer for fixing up apertium-get (and quickly resolving the new issues I
made yesterday)!

I also misspoke earlier re. how APy updates are deployed. Html-tools must
be updated via SSH.
However, the nightly update script pulls down the APy docker image from
Docker Hub. FWIW,
the Docker build had been broken by a switch to pipenv until a couple days
ago but it should
work fine now. The upshot is, updates to things like language names in APy
master should be
reflected on beta.apertium.org within 24 hours.

[image: Sushain Cherivirala]
*Sushain K. Cherivirala *
Stanford University, M.S. in Computer Science '19
Carnegie Mellon University, B.S. in Computer Science '18
(713) 992-4043 | www.skc.name


On Sun, Jul 15, 2018 at 9:51 PM Sushain Cherivirala 
wrote:

> Fran,
>
> In theory, the pairs are updated nightly automatically:
>
> @daily cd /home/apertium/beta/apertium-html-tools &&
> ./tools/docker/deploy-all-pairs.sh
>
> However, due to the following issue, that script hasn't done much for a
> while:
>
> https://github.com/apertium/apertium-get/issues/7
>
> Updates to the APy/Html-tools code are manual via SSH into the projectjj
> machine. Continuous
> deployment could be set-up but it doesn't seem like it would add much
> value.
>
>
> [image: Sushain Cherivirala]
> *Sushain K. Cherivirala *
> Software Engineer Intern, Stripe
> Stanford University, M.S. in Computer Science '19
> Carnegie Mellon University, B.S. in Computer Science '18
> (713) 992-4043  <713-992-4043>| www.skc.name
>
> On Sun, Jul 15, 2018 at 3:32 AM, Francis Tyers 
> wrote:
>
>> Does anyone know how this works ? It would be good to have documentation
>> on the Wiki
>>
>> http://wiki.apertium.org/wiki/Beta
>>
>> * How often are pairs there updated? Is it nightly?
>> * Who can update the pairs there?
>> * What needs to be done in order to update them?
>>
>> If someone knows, please reply here and I'll summarise the wisdom on the
>> Wiki.
>>
>> Thanks!
>>
>> Fran
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution

2019-03-29 Thread Tanmai Khanna
Hi,
I have submitted a draft for review for the project "Anaphora Resolution"
for GSoC 2019. The project will also include a tool for resolution of
agreement for adjectives in Spanish, Catalan and other languages that need
it.

You can find the proposal here:
http://wiki.apertium.org/wiki/User:Khannatanmai

If anyone has any comments, suggestions, criticism, ideas, I would
really appreciate if you let me know as it'll help me make a stronger
proposal and a better tool for Apertium during GSoC 2019.

Thanks and Regards,
Tanmai Khanna
IRC: khannatanmai

-- 
*Khanna, Tanmai*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff