Re: [Apertium-stuff] beta.apertium.org
Hi Hèctor, It should be using whatever is currently on GitHub. The beta site doesn't use any released pairs, it directly downloads and compiles everything from source. So, I think it should be using apertium-cat-ita since that's what I see in https://github.com/apertium/apertium-cat-ita/blob/master/modes.xml. [image: Sushain Cherivirala] *Sushain K. Cherivirala * Stanford University, M.S. in Computer Science '19 Carnegie Mellon University, B.S. in Computer Science '18 (713) 992-4043 | www.skc.name On Sat, Mar 30, 2019 at 1:41 AM Hèctor Alòs i Font wrote: > Hi Sushain, > > I have a question. For e.g. the Catalan-Italian and the Catalan-Portuguese > language pairs there are the two-letters released versions and the > thee-letters unreleased ones (apertium-ca-it vs apertium-cat-ita, > apertium-pt-ca vs apertium-por-cat). Which ones beta.apertium uses in such > cases? > > Best, > Hèctor > > Missatge de Sushain Cherivirala del dia dv., 29 de > març 2019 a les 21:45: > >> Hi Fran (and others), >> >> This is a pretty late update but as of earlier today, the beta site >> should be updating nightly again! >> >> All the old SVN checkouts were cleared out and replaced with Git clones. >> Special thanks to >> unhammer for fixing up apertium-get (and quickly resolving the new issues >> I made yesterday)! >> >> I also misspoke earlier re. how APy updates are deployed. Html-tools must >> be updated via SSH. >> However, the nightly update script pulls down the APy docker image from >> Docker Hub. FWIW, >> the Docker build had been broken by a switch to pipenv until a couple >> days ago but it should >> work fine now. The upshot is, updates to things like language names in >> APy master should be >> reflected on beta.apertium.org within 24 hours. >> >> [image: Sushain Cherivirala] >> *Sushain K. Cherivirala * >> Stanford University, M.S. in Computer Science '19 >> Carnegie Mellon University, B.S. in Computer Science '18 >> (713) 992-4043 | www.skc.name >> >> >> On Sun, Jul 15, 2018 at 9:51 PM Sushain Cherivirala >> wrote: >> >>> Fran, >>> >>> In theory, the pairs are updated nightly automatically: >>> >>> @daily cd /home/apertium/beta/apertium-html-tools && >>> ./tools/docker/deploy-all-pairs.sh >>> >>> However, due to the following issue, that script hasn't done much for a >>> while: >>> >>> https://github.com/apertium/apertium-get/issues/7 >>> >>> Updates to the APy/Html-tools code are manual via SSH into the projectjj >>> machine. Continuous >>> deployment could be set-up but it doesn't seem like it would add much >>> value. >>> >>> >>> [image: Sushain Cherivirala] >>> *Sushain K. Cherivirala * >>> Software Engineer Intern, Stripe >>> Stanford University, M.S. in Computer Science '19 >>> Carnegie Mellon University, B.S. in Computer Science '18 >>> (713) 992-4043 <713-992-4043>| www.skc.name >>> >>> On Sun, Jul 15, 2018 at 3:32 AM, Francis Tyers >>> wrote: >>> Does anyone know how this works ? It would be good to have documentation on the Wiki http://wiki.apertium.org/wiki/Beta * How often are pairs there updated? Is it nightly? * Who can update the pairs there? * What needs to be done in order to update them? If someone knows, please reply here and I'll summarise the wisdom on the Wiki. Thanks! Fran -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >>> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] beta.apertium.org
Hi Sushain, I have a question. For e.g. the Catalan-Italian and the Catalan-Portuguese language pairs there are the two-letters released versions and the thee-letters unreleased ones (apertium-ca-it vs apertium-cat-ita, apertium-pt-ca vs apertium-por-cat). Which ones beta.apertium uses in such cases? Best, Hèctor Missatge de Sushain Cherivirala del dia dv., 29 de març 2019 a les 21:45: > Hi Fran (and others), > > This is a pretty late update but as of earlier today, the beta site should > be updating nightly again! > > All the old SVN checkouts were cleared out and replaced with Git clones. > Special thanks to > unhammer for fixing up apertium-get (and quickly resolving the new issues > I made yesterday)! > > I also misspoke earlier re. how APy updates are deployed. Html-tools must > be updated via SSH. > However, the nightly update script pulls down the APy docker image from > Docker Hub. FWIW, > the Docker build had been broken by a switch to pipenv until a couple days > ago but it should > work fine now. The upshot is, updates to things like language names in APy > master should be > reflected on beta.apertium.org within 24 hours. > > [image: Sushain Cherivirala] > *Sushain K. Cherivirala * > Stanford University, M.S. in Computer Science '19 > Carnegie Mellon University, B.S. in Computer Science '18 > (713) 992-4043 | www.skc.name > > > On Sun, Jul 15, 2018 at 9:51 PM Sushain Cherivirala > wrote: > >> Fran, >> >> In theory, the pairs are updated nightly automatically: >> >> @daily cd /home/apertium/beta/apertium-html-tools && >> ./tools/docker/deploy-all-pairs.sh >> >> However, due to the following issue, that script hasn't done much for a >> while: >> >> https://github.com/apertium/apertium-get/issues/7 >> >> Updates to the APy/Html-tools code are manual via SSH into the projectjj >> machine. Continuous >> deployment could be set-up but it doesn't seem like it would add much >> value. >> >> >> [image: Sushain Cherivirala] >> *Sushain K. Cherivirala * >> Software Engineer Intern, Stripe >> Stanford University, M.S. in Computer Science '19 >> Carnegie Mellon University, B.S. in Computer Science '18 >> (713) 992-4043 <713-992-4043>| www.skc.name >> >> On Sun, Jul 15, 2018 at 3:32 AM, Francis Tyers >> wrote: >> >>> Does anyone know how this works ? It would be good to have documentation >>> on the Wiki >>> >>> http://wiki.apertium.org/wiki/Beta >>> >>> * How often are pairs there updated? Is it nightly? >>> * Who can update the pairs there? >>> * What needs to be done in order to update them? >>> >>> If someone knows, please reply here and I'll summarise the wisdom on the >>> Wiki. >>> >>> Thanks! >>> >>> Fran >>> >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> ___ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> >> ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution
Hi Tanmai, I add some comments between paragraphs (especially on zero pronouns). Missatge de Tanmai Khanna del dia ds., 30 de març 2019 a les 1:11: > Hi Hector, > Thanks for all your comments. I really appreciate it! :) I'll try to > respond to the best of my abilities: > > When I claimed "The girl ate his apple" is grammatically incoherent, I > meant in the case that this is all of the discourse. You're right that a > pronoun could refer to something in the real world which isn't present in > discourse, but that kind of anaphora resolution is impossible if you have > just text so usually, we just ignore it. > > Before I start answering the question, I also want to point out that this > is an endeavour to build a tool that otherwise uses a lot more > linguistically complex knowledge, without that knowledge and to make it > good enough with the available simple linguistic features available. Some > parts of what can be done or can't be done will be found out experimentally > but I added them in my proposal so that we can try and make an informed > decision as to whether something can be language independent or not. > > 1. Following this thought, let's talk about marking verbs with > antecedents. For dealing with zero pronouns, we *have* *to *mark the > verbs with the antecedents and hence it is something that will be a part of > this tool. > > You're right in saying that it will be hard to capture the subject of a > verb without any configuration. However, that wasn't what I was trying to > do. *I decided to treat zero pronouns as literally zero pronouns.* Assume > a pronoun exists right before the verb and then perform anaphora resolution > on this zero pronoun. This tool will be language agnostic. If the results > are unsatisfactory, we can funnel down and create language-specific > features to identify the subject :) > Assuming that a pronoun exists right before the verb is highly language specific. This works, as a rule, for SVO languages, like English, Spanish and Catalan, but will not work for SOV languages, like (typically) Turkic and Uralic languages (but also i.a. Hindi and German), and VSO, like i.a. Arabic and Celtic languages. As we have quite a lot of non-SVO languages in Apertium, searching a subject right before the verb seems a bad guess. Furthermore, even for a SVO language like Spanish, there are several quite often verbs for whom the subject in located after the verb, e.g.: Me faltan libros Me gustan los plátanos Me duelen las muelas etc. Or in SVO languages like Russian or Esperanto, if is not rare to place the subject after the verb, since the case tells us what is the subject. Also, in a language like Spanish there are quite a lot of time constructions like El lunes irá al médico (word by word translation: Monday will-go to-the doctor) It is very likely that "lunes" will be chosen as the subject of "irá". (The same for dates e.g.: El 3 de abril irá al médico = The 3 of April will-go to-the doctor.) So, I think the system should deal with different language typologies, and probably would need some configuration to deal with "special verbs" in a specific language, like "faltar", "gustar", "doler" given in the Spanish examples. Of course, you can try which are the results in the EU corpus with the system you propose, but it don't think there will be a good percentage of success in German, Finnish and Hungarian, and, I guess, they will be worse in Slavic languages than in Romance and, of course, English. 2. Identifying antecedents of adjectives (so to speak) will require > separate metrics, but these examples are exactly along the lines of what > I've been thinking, i.e. detecting relative clauses and moving them out of > the way to let the adjective recognise its antecedent. It probably > recognises that for "The lady with the book" because "the book" is part > of a PP which cannot be the subject of "is", similarly I will try to create > relative clause detection to ignore that and connect nice to the nice lady. > > 3. So "tall" would get the correct adjective if we could do anaphora > resolution for first and second person pronouns but that becomes a lot more > complex than third person pronouns. Correct me if I'm wrong, but first and > second person pronouns are usually resolved in the real world, and not very > often said first in context. If you ask me I would leave those out for now. > But you're right, it is interesting to think about how to deal with them. > Maybe in cases where the person introduces themselves first, we should be > able to attach it to "I" in "I am". > Yes, the problem is the one you say. It is generally impossible in, for example, an English text to know whether "I" or "you" are male of female, or "we" is inclusive or exclusive, etc. That's why I thing it's better to forget about 1st and 2nd persons (imho). > 4. I was told that Anaphora is needed in Catalan as well, and if we use > the same module for both we still have to test how it performs on both
Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution
Hi Hector, Thanks for all your comments. I really appreciate it! :) I'll try to respond to the best of my abilities: When I claimed "The girl ate his apple" is grammatically incoherent, I meant in the case that this is all of the discourse. You're right that a pronoun could refer to something in the real world which isn't present in discourse, but that kind of anaphora resolution is impossible if you have just text so usually, we just ignore it. Before I start answering the question, I also want to point out that this is an endeavour to build a tool that otherwise uses a lot more linguistically complex knowledge, without that knowledge and to make it good enough with the available simple linguistic features available. Some parts of what can be done or can't be done will be found out experimentally but I added them in my proposal so that we can try and make an informed decision as to whether something can be language independent or not. 1. Following this thought, let's talk about marking verbs with antecedents. For dealing with zero pronouns, we *have* *to *mark the verbs with the antecedents and hence it is something that will be a part of this tool. You're right in saying that it will be hard to capture the subject of a verb without any configuration. However, that wasn't what I was trying to do. *I decided to treat zero pronouns as literally zero pronouns.* Assume a pronoun exists right before the verb and then perform anaphora resolution on this zero pronoun. This tool will be language agnostic. If the results are unsatisfactory, we can funnel down and create language-specific features to identify the subject :) 2. Identifying antecedents of adjectives (so to speak) will require separate metrics, but these examples are exactly along the lines of what I've been thinking, i.e. detecting relative clauses and moving them out of the way to let the adjective recognise its antecedent. It probably recognises that for "The lady with the book" because "the book" is part of a PP which cannot be the subject of "is", similarly I will try to create relative clause detection to ignore that and connect nice to the nice lady. 3. So "tall" would get the correct adjective if we could do anaphora resolution for first and second person pronouns but that becomes a lot more complex than third person pronouns. Correct me if I'm wrong, but first and second person pronouns are usually resolved in the real world, and not very often said first in context. If you ask me I would leave those out for now. But you're right, it is interesting to think about how to deal with them. Maybe in cases where the person introduces themselves first, we should be able to attach it to "I" in "I am". 4. I was told that Anaphora is needed in Catalan as well, and if we use the same module for both we still have to test how it performs on both. But as mentioned in the proposal, I'll try to make the anaphora tool as language agnostic as possible and will test it with multiple pairs to see the result. If you have any pair suggestions right now that need it I can add them. 5. I'm using Apertium Simpleton UI for MacOS and for "La chica está aquí, lleva un vestido rojo.", I get "The girl is here, spends a red dress" (Attaching Screenshot makes email too big to send so just take my word for it :P ). Not sure why Thanks for all your questions and suggestions, they'll definitely help me build a better tool. I really hope I was able to answer your questions satisfactorily. If not, I apologise and I wouldn't mind a follow up. It will certainly help me even more. :) On Sat, Mar 30, 2019 at 12:54 AM Hèctor Alòs i Font wrote: > Hi Tanmai, > > I won't be a mentor, but I asked for anaphora resolution in Apertium, so, > if I am allowed, I'd like some clarification about the proposal (which, I > think, is great - congrats). > > First of all, note that "The girl ate his apple" is not grammatically > incoherent. Maybe she ate an apple given by a male friend of her. Anaphora > resolution is complicated i.a. because language is often ambiguous. > > 1. I've been thinking about the example > > La chica comió su manzana > > Let's suppose that the antecedent of "su" is "la chica". > If the target language would be a Slavic language or Esperanto, the > selection will not be between "his", "her" or "its", but also a reflexive > possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but > not Девушка съела её яблоко. If using the proposal in > http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could > we deal with it. We probably should need to have a referent in the verb > too, in order to be able to compare in the transfer rules whether the > antecedent of "su" is also the antecedent of "comió". > > So, my point is: will the user be able to "configure" for which parts of > speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair > I don't see any need to track the "antecedents" of verbs, but for e.g. > Spanish to English it
Re: [Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution
Hi Tanmai, I won't be a mentor, but I asked for anaphora resolution in Apertium, so, if I am allowed, I'd like some clarification about the proposal (which, I think, is great - congrats). First of all, note that "The girl ate his apple" is not grammatically incoherent. Maybe she ate an apple given by a male friend of her. Anaphora resolution is complicated i.a. because language is often ambiguous. 1. I've been thinking about the example La chica comió su manzana Let's suppose that the antecedent of "su" is "la chica". If the target language would be a Slavic language or Esperanto, the selection will not be between "his", "her" or "its", but also a reflexive possessive pronoun, e.g. in Russian Девушка съела своё яблоко, but not Девушка съела её яблоко. If using the proposal in http://wiki.apertium.org/wiki/Anaphora_resolution I'm not sure how could we deal with it. We probably should need to have a referent in the verb too, in order to be able to compare in the transfer rules whether the antecedent of "su" is also the antecedent of "comió". So, my point is: will the user be able to "configure" for which parts of speech should the antecedent be tracked? E.g. for the Catalan-Spanish pair I don't see any need to track the "antecedents" of verbs, but for e.g. Spanish to English it seems necessary for dealing with zero pronouns. (By the way, I am surprised that e.g. the subject of a verb can be tracked by a language-independent tool without any configuration. I really doubt this can be true.) 2. The examples in "Reflexive pronouns" and "Long distance agreement" seem very difficult. I'd propose a few simpler agreements: * The lady with the book is nice. * The lady reading the book is nice. * The lady who reads the book is nice. "Nice" should be feminine in Spanish/Catalan (currently it happens only in the first case) * The singers that sing sing well. Both "sing" should be p3pl in Spanish/Catalan, currently they are not ("Los cantantes que canta canta bien"). 3. Let's accept that we will deal only with the 3rd person. It is too complicated to resolve: * I'm tall (gender?) * You are tall (gender? number?) 4. I cannot see why it should be useful to test the system with the Spanish-English and Catalan-English pairs. As for the anaphora, if I am not wrong, Catalan and Spanish are twins. One pair of the two seems enough. 5. One detail: the current translation of La chica está aquí, lleva un vestido rojo. is: The girl is here, carries a red dress. Best, Hèctor Missatge de Tanmai Khanna del dia dv., 29 de març 2019 a les 15:48: > Hi, > I have submitted a draft for review for the project "Anaphora Resolution" > for GSoC 2019. The project will also include a tool for resolution of > agreement for adjectives in Spanish, Catalan and other languages that need > it. > > You can find the proposal here: > http://wiki.apertium.org/wiki/User:Khannatanmai > > If anyone has any comments, suggestions, criticism, ideas, I would > really appreciate if you let me know as it'll help me make a stronger > proposal and a better tool for Apertium during GSoC 2019. > > Thanks and Regards, > Tanmai Khanna > IRC: khannatanmai > > -- > *Khanna, Tanmai* > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] beta.apertium.org
Hi Fran (and others), This is a pretty late update but as of earlier today, the beta site should be updating nightly again! All the old SVN checkouts were cleared out and replaced with Git clones. Special thanks to unhammer for fixing up apertium-get (and quickly resolving the new issues I made yesterday)! I also misspoke earlier re. how APy updates are deployed. Html-tools must be updated via SSH. However, the nightly update script pulls down the APy docker image from Docker Hub. FWIW, the Docker build had been broken by a switch to pipenv until a couple days ago but it should work fine now. The upshot is, updates to things like language names in APy master should be reflected on beta.apertium.org within 24 hours. [image: Sushain Cherivirala] *Sushain K. Cherivirala * Stanford University, M.S. in Computer Science '19 Carnegie Mellon University, B.S. in Computer Science '18 (713) 992-4043 | www.skc.name On Sun, Jul 15, 2018 at 9:51 PM Sushain Cherivirala wrote: > Fran, > > In theory, the pairs are updated nightly automatically: > > @daily cd /home/apertium/beta/apertium-html-tools && > ./tools/docker/deploy-all-pairs.sh > > However, due to the following issue, that script hasn't done much for a > while: > > https://github.com/apertium/apertium-get/issues/7 > > Updates to the APy/Html-tools code are manual via SSH into the projectjj > machine. Continuous > deployment could be set-up but it doesn't seem like it would add much > value. > > > [image: Sushain Cherivirala] > *Sushain K. Cherivirala * > Software Engineer Intern, Stripe > Stanford University, M.S. in Computer Science '19 > Carnegie Mellon University, B.S. in Computer Science '18 > (713) 992-4043 <713-992-4043>| www.skc.name > > On Sun, Jul 15, 2018 at 3:32 AM, Francis Tyers > wrote: > >> Does anyone know how this works ? It would be good to have documentation >> on the Wiki >> >> http://wiki.apertium.org/wiki/Beta >> >> * How often are pairs there updated? Is it nightly? >> * Who can update the pairs there? >> * What needs to be done in order to update them? >> >> If someone knows, please reply here and I'll summarise the wisdom on the >> Wiki. >> >> Thanks! >> >> Fran >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Anaphora Resolution and Long Distance Agreement Resolution
Hi, I have submitted a draft for review for the project "Anaphora Resolution" for GSoC 2019. The project will also include a tool for resolution of agreement for adjectives in Spanish, Catalan and other languages that need it. You can find the proposal here: http://wiki.apertium.org/wiki/User:Khannatanmai If anyone has any comments, suggestions, criticism, ideas, I would really appreciate if you let me know as it'll help me make a stronger proposal and a better tool for Apertium during GSoC 2019. Thanks and Regards, Tanmai Khanna IRC: khannatanmai -- *Khanna, Tanmai* ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff