Re: [Apertium-stuff] Formally deprecate lttoolbox-java?

2023-03-27 Thread Marc Riera Irigoyen
Hello,

For the sake of completeness, the Apertium plugin bundled in OmegaT added
support for custom API URL a couple of years ago. So it is possible to
interact with a local installation of Apertium via Apertium-Apy. It is not
as simple as apertium-omegat-native, but it may be enough for many people
before Linux support is available.

Regards,

*Marc Riera*


Missatge de Tino Didriksen  del dia dl., 27 de març
2023 a les 10:52:

> True, but it's not much work to make it run on Linux. Just have to call
> gksudo instead of downloading 7z files.
>
> Since it seems OmegaT is important to people, I'll commit to ensuring
> omegat-native works on the same platforms that the nightly builds target,
> before marking lttoolbox-java as deprecated.
>
> -- Tino Didriksen
>
>
> On Mon, 27 Mar 2023 at 10:24, Felipe Sánchez Martínez 
> wrote:
>
>> Hi,
>>
>> According to the wiki this plugin does not work on Linux.
>>
>> Felipe
>> El 26/3/23 a las 21:32, Tino Didriksen escribió:
>>
>> The replacement is https://github.com/apertium/apertium-omegat-native -
>> which admittedly also needs updating, but it's far easier to get functional
>> again.
>>
>> -- Tino Didriksen
>>
>>
>> On Sun, 26 Mar 2023 at 21:06, Felipe Sánchez Martínez <
>> fsanc...@dlsi.ua.es> wrote:
>>
>>> Hi all,
>>>
>>> I guess that the Apertium plugin for OmegaT is running internally
>>> lttoolbox-java. I do no have information on the amount of people using this
>>> plugin, but I guess there are some (me for example). How much time would it
>>> cost to port the latter developments to Java?
>>>
>>> Regards
>>>
>>> Felipe
>>> El 26/3/23 a las 19:12, Tino Didriksen escribió:
>>>
>>> [CC apertium-stuff & PMC]
>>>
>>> Cf. https://github.com/apertium/organisation/issues/34
>>>
>>> Given that lttoolbox-java and its downstream dependents have been
>>> falling further and further behind the native tools, I say we formally
>>> deprecate the Java port, mark all related wiki pages with a warning, and
>>> archive the relevant repos.
>>>
>>> -- Tino Didriksen
>>>
>>>
>>>
>>> ___
>>> Apertium-stuff mailing 
>>> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>> --
>>> [image: Universitat d'Alacant / Universidad de Alicante]
>>>
>>> Dept. de Llenguatges i Sistemes Informàtics
>>>
>>> Felipe Sánchez Martínez
>>>
>>> Associate Professor - Profesor Titular de Universidad
>>>
>>> Tel.: (+34) 965 90 34 00 ext. 2966
>>>
>>> Email: fsanc...@ua.es, fsanc...@dlsi.ua.es
>>>
>>> Web: https://www.dlsi.ua.es/~fsanchez/
>>>
>>
>>
>> ___
>> Apertium-stuff mailing 
>> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> --
>> [image: Universitat d'Alacant / Universidad de Alicante]
>>
>> Dept. de Llenguatges i Sistemes Informàtics
>>
>> Felipe Sánchez Martínez
>>
>> Associate Professor - Profesor Titular de Universidad
>>
>> Tel.: (+34) 965 90 34 00 ext. 2966
>>
>> Email: fsanc...@ua.es, fsanc...@dlsi.ua.es
>>
>> Web: https://www.dlsi.ua.es/~fsanchez/
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Capitalization Handling

2022-12-27 Thread Marc Riera Irigoyen
Thanks for the great work! I'll make sure to test it with apertium-eng-cat,
which has generation errors due to capitalization.

Happy holidays!

*Marc Riera*


Missatge de Hèctor Alòs i Font  del dia ds., 24 de
des. 2022 a les 14:12:

> Looks very good, Daniel. Thanks in advance. I'll try to test in the next
> days in the pairs I maintain.
> Merry Christmas/Hanukkah/New Year/*.
> Hèctor
>
> Missatge de Daniel Swanson  del dia dv., 23
> de des. 2022 a les 0:41:
>
>> Greetings Apertiumers!
>>
>> I have two updates to report:
>>
>> First, I have rewritten the postgenerator (again), this time as part
>> of apertium-separable (and so not breaking the old one, unlike last
>> time), and in such a way that postgenerator rules can both match on
>> lemma and tags in addition to surface forms and iteratively apply to
>> their own output.
>>
>> This is available as part of apertium-separable 0.7.0 and is
>> documented at https://wiki.apertium.org/wiki/Postgenerator
>>
>> Second, I just added a pair of modules which move capitalization
>> information into word-bound blanks at the beginning of the pipeline
>> and then reapply them according to LRX-like rules at the end of the
>> pipeline, allowing all intermediate modules to operate solely on
>> dictionary case.
>>
>> This should be available after the next nightly build (i.e. tomorrow)
>> in apertium 3.9.0, and is documented at
>> https://wiki.apertium.org/wiki/Capitalization_restoration
>>
>> If anyone has questions or would like help trying this out for a
>> language pair or if I missed something in the documentation, let me
>> know.
>>
>> Thanks to Kevin Unhammer and Marc Riera for helping me figure out what
>> the design of the capitalization module should be.
>>
>> Merry Christmas,
>> Daniel
>>
>> P.S. To anyone not interested in either of these developments: your
>> Christmas gift is that I accidentally made lexical selection quite a
>> bit faster while I was working on these.
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Repo for testvoc scripts

2022-07-10 Thread Marc Riera Irigoyen
Hello all,

I've taken the liberty of creating a repo (1) to collect all the current
and future scripts used for testvoc.

Currently, each pair has its own set of scripts and development depends on
the pair. This means that, unless the same person develops several pairs or
there is frequent communication, improvements in these scripts are not
propagated between pairs. The idea is to have this repo in common and try
to create and maintain a robust, pair-independent set of scripts for
testvoc or any related tests.

Given that the wiki (2) links to the scripts in apertium-swe-dan and that
these are widely used in other pairs, I've taken these as a starting point.
There are other scripts from other pairs that I'll add later, with the aim
of ultimately combining their features. A conversion to a standalone tool
(like apertium-regtest) may even be desirable (idea for a future GSoC
project?).

The repo is under the Apertium organization in GitHub, but I'm currently
the only one with full privileges. Please tell me which teams/people are
standard for Apertium repos to sort it out. Thanks!

Regards,

*Marc Riera*

(1) https://github.com/apertium/apertium-testvoc-scripts
(2) https://wiki.apertium.org/wiki/Testvoc
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Infinite loop in testvoc???

2020-08-15 Thread Marc Riera Irigoyen
I've been able to reproduce the loop and fix it. It was mainly due to an
unexpected pattern in the testvoc script, but there was also a typo in the
bidix that contributed to the problem.

1. The testvoc script did not account for bidix entries with empty
translations and would add extra slashes in many cases. These are used to
test multiple translations for a single entry, which is done by an awk
script in a while loop that could not be escaped. I have fixed the issue
with the extra slashes and changed the while loop to a for limited to 50
iterations. This should be enough for any pair and the loop includes a
condition to escape it before the 50 iterations, so there is no extra
unnecessary processing. I'll post a pull request directly to the repo with
the fixes shortly.
2. There is an entry in the bidix (and probably Arpitan monodix as well,
because it generates properly), "Salinas de Gotari", with a line break
after the last tag. It looks like a typo. This typo appears to be valid in
Apertium format but the testvoc script assumes an entry per line and the
double slashes occurred here too. Thanks to the loop limit, testvoc doesn't
get blocked anymore by this entry (and it doesn't appear in the list of
errors, because it generates properly), but it should be fixed.

Regards,

*Marc Riera*


Missatge de Marc Riera Irigoyen  del dia
ds., 15 d’ag. 2020 a les 11:53:

> Hello Hèctor,
>
> I see that the testvoc script you're using is the one I developed based on
> previous scripts used in several pairs. It shouldn't be producing a loop
> and have never found it before. Given that it's happening only when
> translating from Arpitan to French, I guess there may be something that I
> didn't account for when developing the script. I'll take a look and try to
> recreate it.
>
> Regards,
>
> *Marc Riera*
>
>
> Missatge de Hèctor Alòs i Font  del dia ds., 15
> d’ag. 2020 a les 10:46:
>
>> I am experiencing a very strange behaviour in the fra-frp testvoc. While
>> there is not any problem in the frp2fra side (the test is finished in less
>> than 30 minutes in my computer), in the fra2frp there is a kind of
>> infinitive loop. The same fine is again and again created and deleted and
>> the tesvoc does not end even waiting during more than 24 hours. The file
>> which is deleted and created again and again (always with the same name)
>> has exactly the same content. The first lines are:
>>
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^frère$]^frère/~/frâre$+^./~/.$
>>
>> [\^1er$]^1er/~/1ér$+^./~/.$
>>
>> [\^1er$]^1er/~/1ér$+^./~/.$
>>
>> [\^1er$]^1er/~/1ér$+^./~/.$
>>
>> [\^1er$]^1er/~/1ér$+^./~/.$
>>
>> [\^abattu$]^abattu/~/abatu$+^./~/.$
>>
>> [\^abattu$]^abattu/~/dèfêt$+^./~/.$
>>
>> [\^abattu$]^abattu/~/dèchesu$+^./~/.$
>>
>> [\^abattu$]^abattu/~/abatu$+^./~/.$
>>
>> I have never seen such a thing before and I cannot imagine what can cause
>> this behaviour. Any ideas?
>>
>> Hèctor
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Infinite loop in testvoc???

2020-08-15 Thread Marc Riera Irigoyen
Hello Hèctor,

I see that the testvoc script you're using is the one I developed based on
previous scripts used in several pairs. It shouldn't be producing a loop
and have never found it before. Given that it's happening only when
translating from Arpitan to French, I guess there may be something that I
didn't account for when developing the script. I'll take a look and try to
recreate it.

Regards,

*Marc Riera*


Missatge de Hèctor Alòs i Font  del dia ds., 15 d’ag.
2020 a les 10:46:

> I am experiencing a very strange behaviour in the fra-frp testvoc. While
> there is not any problem in the frp2fra side (the test is finished in less
> than 30 minutes in my computer), in the fra2frp there is a kind of
> infinitive loop. The same fine is again and again created and deleted and
> the tesvoc does not end even waiting during more than 24 hours. The file
> which is deleted and created again and again (always with the same name)
> has exactly the same content. The first lines are:
>
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^frère$]^frère/~/frâre$+^./~/.$
>
> [\^1er$]^1er/~/1ér$+^./~/.$
>
> [\^1er$]^1er/~/1ér$+^./~/.$
>
> [\^1er$]^1er/~/1ér$+^./~/.$
>
> [\^1er$]^1er/~/1ér$+^./~/.$
>
> [\^abattu$]^abattu/~/abatu$+^./~/.$
>
> [\^abattu$]^abattu/~/dèfêt$+^./~/.$
>
> [\^abattu$]^abattu/~/dèchesu$+^./~/.$
>
> [\^abattu$]^abattu/~/abatu$+^./~/.$
>
> I have never seen such a thing before and I cannot imagine what can cause
> this behaviour. Any ideas?
>
> Hèctor
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Get a glossary from a TMX memory

2020-06-28 Thread Marc Riera Irigoyen
Hello there,

I assume that you want to convert the pairs from the TMX memory into a
simpler format that can be used to create Apertium dictionary entries. Is
that correct? In that case, you could take a look at Okapi Framework and
convert the TMX file to a tab-separated text file with Okapi Rainbow:
https://okapiframework.org/wiki/index.php/Rainbow

Okapi may be a little intimidating at first, but the documentation is quite
complete. Basically, you execute a pipeline of steps over a set of
documents, similarly to how Apertium chains modules. There is a predefined
pipeline for format conversion in the Rainbow interface that you will be
able to use.

*Marc Riera*


Missatge de Francis Tyers  del dia dg., 28 de juny
2020 a les 12:41:

> El 2020-06-28 11:27, Hèctor Alòs i Font escribió:
> > I think nobody answered to this post. Gianfranco has a problem in
> > getting bilingual dictionaries from OmegaT and loading them into the
> > Apertium dictionaries.
> >
> > As far as I understood, he tried:
> >
> https://wiki.apertium.org/wiki/Getting_bilingual_dictionaries_from_OmegaWiki
> >
> > I don't know anything on OmegaT, but it would be pretty important for
> > the Italian-Sardinian pair to get data from translation memories used
> > to translate administrative texts in several town councils in
> > Sardinia.
> >
> > Could someone help him?
> >
>
> Without seeing what data he has and what outcome he would like it is
> difficult
> to advise further.
>
> Would it be possible to get access to the data? Or is it already in the
> apertium-srd-ita pair?
>
> I agree that this is important! :)
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Problems with testvoc

2019-06-29 Thread Marc Riera Irigoyen
Hello Hèctor,
Which script are you using for testvoc? It looks like you are not
trimming the Catalan monodix, so the script is testing every possible
analysis regardless of whether it is in the pair or not.
A couple of months ago I began working on a new testvoc script for
apertium-eng-cat and apertium-ron-cat based on an old script. My idea
was to develop something portable to any pair without hardcoded values,
so it stores pair-specific configuration in a configuration file. It
needs better error handling to be properly "released", but it mostly
works and I am sure you will find it useful. You can find it in the
dev/testvoc folder in both pairs.
The script checks for generation errors (including every possible
translation for polysemic entries using lexical selection) and for
double generation (errors in the target monodix). By default, with no
options, the script does a full testvoc and generates a summary, but
there are three options: -e (ignore ; works faster with
romance languages), -q ("quiet"; does not generate summaries) and -u
("unknowns"; checks for entries in the bidix missing from monodixes,
uses an external script). It will probably be more than enough for your
needs and solve both issues.
Regards,
Marc
El ds. 29 de 06 de 2019 a les 10:00 +0300, en/na Hèctor Alòs i Font va
escriure:
> I'm having problems with testvoc. There are of two kinds. The main
> one is that testvoc generates all forms of the lemmas present in the
> monodix, but not only the ones existing in the bilingual dictionary.
> This is catastrophic when testing from Catalan, which has tens of
> thousands of lemmas which can't be added to the bidix (and often this
> is not really needed). For instance for "taula", in apertium-cat-ita:
> ^taula# braser/@taula# braser$ ^./.$
> ^taula# braser/@taula# braser$ ^./.$
> ^taula# de la Llei/@taula# de la Llei$
> ^./.$
> ^taula# de la Llei/@taula# de la Llei$
> ^./.$
> ^taula# de multiplicar/@taula# de multiplicar$
> ^./.$
> ^taula# de multiplicar/@taula# de multiplicar$
> ^./.$
> ^taula# de salvació/@taula# de salvació$
> ^./.$
> ^taula# de salvació/@taula# de salvació$
> ^./.$
> ^taula# d'harmonia/@taula# d'harmonia$
> ^./.$
> ^taula# d'harmonia/@taula# d'harmonia$
> ^./.$
> ^taula/tavolo/tavola/tabella$
> ^./.$
> ^taula/tavolo/tavola/tabella$
> ^./.$
> ^taula numèrica/@taula numèrica$
> ^./.$
> ^taula numèrica/@taula numèrica$
> ^./.$
> ^taula periòdica/@taula periòdica$
> ^./.$
> ^taula periòdica/@taula periòdica$
> ^./.$
> 
> The second problem, is that the script does not include a call to the
> lexical selection, so not always the "real" translations are tested,
> but one forbidden by the lexical selection.
> 
> I'm solving the second issue (this seems to be trivial), but I'm not
> sure how to deal with the first one. Are there any suggestions?
> 
> Best,
> Hèctor
> 
> ___Apertium-stuff mailing
> listapertium-st...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Reusing Apertium linguistic data and licensing

2019-03-12 Thread Marc Riera Irigoyen
Hello,

Jaume Ortolà (contributing mainly to apertium-spa-cat) and I are preparing
a project to create a free ("libre") English-Catalan dictionary for
Softcatalà [1]. This is part of a call by Fundació.cat [2] for projects
promoting the Catalan language in technology, which will receive funding if
selected.

Our goal is to compile data from other sources (including terminology from
Termcat [3] and entries from DACCO [4]) and revise it. In addition, we have
been specially thinking about Apertium's English-Catalan pair as a
potential source and destination of data, given our involvement in the
project. It could be a great opportunity to expand the pair's bilingual
dictionary with a big amount of high-quality entries.

However, we are not sure about the potential licensing limitations this
could pose. Apertium is licensed under GPLv3, yet the other sources we have
found so far are licensed under CC-BY-SA. We know, for example, that
CC-BY-SA is one-way compatible with GPLv3 since version 4.0, which would
allow us to later include the data in Apertium, but not the other way. We
have no specific license in mind for the project yet; we want to release
the data and source for free for everyone to use it and reuse it, but the
fact that this project involves data from different sources with different
licenses makes everything a bit convoluted.

Does anyone know which options do we have to be able to reuse Apertium data
in such a project?

Thank you very much in advance,

*Marc Riera*

[1] https://www.softcatala.org
[2] https://convocatoria.fundacio.cat
[3] http://www.termcat.cat
[4] http://www.catalandictionary.org
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Romanian-Catalan pair ready for release

2019-01-24 Thread Marc Riera Irigoyen
Just a kind reminder that apertium-ron-cat is ready for release on
apertium.org and has not been released yet. The same applies to
apertium-eng-cat. Both pairs were made public in December in Softcatalà's
deployment of Apertium and no further bugs preventing release were found,
so it is safe to release them.

Thanks!

*Marc Riera*


Missatge de Marc Riera Irigoyen  del dia
dc., 28 de nov. 2018 a les 18:29:

> Hello all,
>
> I am glad to announce that the Romanian-Catalan pair (apertium-ron-cat) is
> now ready to be packaged and released to the public. The pair should also
> be moved to trunk (it is currently in the nursery).
>
> This pair was developed as part of GSoC 2018 and for the first time
> provides direct translation between the two languages (other translation
> platforms use English as a pivot language). While still falling behind
> other platforms, specially in the Catalan>Romanian direction, which is the
> less developed and has a WER/PER of 52.9%/44.8%, I think results for the
> Romanian>Catalan are good enough (39%/28.8% WER/PER) considering the
> limited development time. For comparison, Google and Yandex score
> 12.6%/8.7% and 29.3%/21.4% respectively when translating from Romanian to
> Catalan.
>
> I would also like to take this opportunity to remind that the updated
> English-Catalan pair (apertium-eng-cat) developed as part of GSoC 2017 is
> still due packaging and release. It was previously blocked by a few bugs in
> apertium-separable and apertium-tagger, but as far as I know these have
> been fixed.
>
> Thanks to everyone for the support!
>
> *Marc Riera*
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Romanian-Catalan pair ready for release

2018-11-28 Thread Marc Riera Irigoyen
Hello all,

I am glad to announce that the Romanian-Catalan pair (apertium-ron-cat) is
now ready to be packaged and released to the public. The pair should also
be moved to trunk (it is currently in the nursery).

This pair was developed as part of GSoC 2018 and for the first time
provides direct translation between the two languages (other translation
platforms use English as a pivot language). While still falling behind
other platforms, specially in the Catalan>Romanian direction, which is the
less developed and has a WER/PER of 52.9%/44.8%, I think results for the
Romanian>Catalan are good enough (39%/28.8% WER/PER) considering the
limited development time. For comparison, Google and Yandex score
12.6%/8.7% and 29.3%/21.4% respectively when translating from Romanian to
Catalan.

I would also like to take this opportunity to remind that the updated
English-Catalan pair (apertium-eng-cat) developed as part of GSoC 2017 is
still due packaging and release. It was previously blocked by a few bugs in
apertium-separable and apertium-tagger, but as far as I know these have
been fixed.

Thanks to everyone for the support!

*Marc Riera*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium eng-cat release

2018-05-29 Thread Marc Riera Irigoyen
Hello,

Seeing that another pair depending on apertium-cat (apertium-spa-cat) is
getting a release, I think it is a good idea to finally release the new
version of apertium-eng-cat. The bug in apertium-separable which also
affected apertium-fra-cat is now fixed (as far as I know), and the release
has been kept on hold for quite a long time already. Given that the pair is
clean with the latest commit in the apertium-cat repository, I think it
makes sense to push it forward now.

Thanks!

Marc

Missatge de Hèctor Alòs i Font  del dia dl., 12 de
març 2018 a les 19:39:

> 2018-03-12 20:59 GMT+03:00 Marc Riera Irigoyen <
> marc.riera.irigo...@gmail.com>:
>
>> What kind of lexical coverage do Google/Yandex have ?
>>>
>>
>> This text shows about 98% coverage for Google and 97% coverage for
>> Yandex, based on words left untranslated.
>>
>
> But I think these coverages are not comparable to the ones in Apertium,
> where we count as uncovered mostly proper nouns that often don't have to be
> translated. How you you compare Apertium's and Yandex's or Google's
> coverages, Fran?
>
> By the way, Marc, you said that
>
> I have not really evaluated translations from Catalan (most of the
>>>> development has taken place in the other direction), but it should be
>>>> more or less the same as the old pair.
>>>>
>>>
>>>
> I bet it is not. At least for French-Catalan the translation from Catalan
> is a lot worse than the translation from French. Among other things, that's
> because this side is much harder because of the personal pronouns. It is
> not very difficult to delete them from French, but it is very difficult to
> add them (only if needed) from Catalan. With a shallow syntactic analysis
> it is almost impossible. There are lots of problems because of that from
> Catalan to French. Often the subject pronoun in missing or, on the
> contrary, it is added when it should not.
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [urgent] Java Apertium malfunctions

2018-05-10 Thread Marc Riera Irigoyen
Thank you very much Jacob for the quick fix!

Unfortunately, I can't seem to get past Apertium Viewer's splash screen.
The program shows the Apertium logo for a while and then silently fails. If
I run it from the terminal I get this error message:

Exception in thread "AWT-EventQueue-0" java.lang.NoClassDefFoundError:
org/apertium/pipeline/Mode

Marc

Missatge de Mikel L. Forcada <m...@dlsi.ua.es> del dia dj., 10 de maig 2018
a les 9:46:

> Hi Jacob!
> This is oh so cool! Thank you very much! I have changed the wiki pages to
> point directly to the new .jar files.
> I have checked Apertium-caffeine and works great.
> I will check Apertium-omegat today.
> Thank you all folks for such a quick response!
> Cheers
> Mikel
>
> El 10/05/18 a les 08:31, Jacob Nordfalk ha escrit:
>
> Ive compiled and release new versions.
>
> As with apertium-viewer, and the Android app, please check and confirm
> that they work for you
>
> https://github.com/apertium/apertium-omegat/releases/tag/v1.0
>
> https://github.com/apertium/apertium-caffeine/releases/tag/v1.0
>
> Yours,
> Jacob
>
> 2018-05-09 22:53 GMT+02:00 Mikel L. Forcada <m...@dlsi.ua.es>:
>
>> Folks,
>> it would be great to have
>> (a) a ready-made build/release for apertium-caffeine in GitHub which I
>> can link from the wiki. I tried compiling (ant) but I seem to have some
>> mismatch between my javac and what's required in build.xml. I can try to
>> fix this with your help, but I'm sure this would be easily fixed by any of
>> you java folks!
>> (b) a GitHub repository for apertium-omegat in GitHub, including
>> builds/releases that I can link from the wiki (many people use it from
>> inside OmegaT, no matter how deprecated this is.
>> Could you guys help me?
>> Cheers
>> Mikel
>>
>> 2018-05-09 18:05 GMT+02:00 Jacob Nordfalk <jacob.nordf...@gmail.com>:
>>
>>> Ive released a new version.
>>>
>>> Download latest apertium-viewer.jar
>>> <https://github.com/apertium/apertium-viewer/releases> and save it to
>>> your hard drive.
>>>
>>> wget 
>>> https://github.com/apertium/apertium-viewer/releases/download/2.5.4/apertium-viewer.jar
>>>  -O apertium-viewer.jar
>>> java -Xmx500m -jar apertium-viewer.jar
>>>
>>> When you have checked it works for you please write to me and I'll
>>> publish make a notice to evryone currently using, with instructions how it
>>> to update.
>>>
>>>
>>>
>>>
>>> 2018-05-09 16:29 GMT+02:00 Tino Didriksen <tino.didrik...@gmail.com>:
>>>
>>>> I've created a Java team https://github.com/orgs/apertium/teams/java
>>>> with admin access to the 4 Java-based repositories. Current members
>>>> nordfalk,  artetxem, and myself.
>>>>
>>>> Also imported https://github.com/apertium/apertium-viewer
>>>>
>>>> -- Tino Didriksen
>>>>
>>>>
>>>> On 9 May 2018 at 16:09, Marc Riera Irigoyen <
>>>> marc.riera.irigo...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I was presenting Apertium to a group of students today and we failed
>>>>> to get the online mode in Apertium Viewer to work. It seems that the SVN
>>>>> paths are broken in there too (it complains about being unable to resolve
>>>>> the URLs).
>>>>>
>>>>> Regards,
>>>>>
>>>>> Marc
>>>>>
>>>>> El dc., 9 maig 2018, 15:54, Jacob Nordfalk <jacob.nordf...@gmail.com>
>>>>> va escriure:
>>>>>
>>>>>>
>>>>>>> However I cant push my changes to
>>>>>>> https://github.com/apertium/apertium-android
>>>>>>>
>>>>>>>
>>>>>> Sorry, I have access, just forgot to accept the invitation :-)
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>>>
>>>
>>>
>>> --
>>> Jacob Nordfalk <http://profiles.goog

Re: [Apertium-stuff] [urgent] Java Apertium malfunctions

2018-05-09 Thread Marc Riera Irigoyen
Hello!

I was presenting Apertium to a group of students today and we failed to get
the online mode in Apertium Viewer to work. It seems that the SVN paths are
broken in there too (it complains about being unable to resolve the URLs).

Regards,

Marc

El dc., 9 maig 2018, 15:54, Jacob Nordfalk  va
escriure:

>
>> However I cant push my changes to
>> https://github.com/apertium/apertium-android
>>
>>
> Sorry, I have access, just forgot to accept the invitation :-)
>
>
>
>
>> --
>> Jacob Nordfalk 
>> Androidudvikler og -underviser på DTU
>> 
>> Tlf 26206512 - javabog.dk
>>
>
>
>
> --
> Jacob Nordfalk 
> Androidudvikler og -underviser på DTU
> 
> Tlf 26206512 - javabog.dk
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC 2018: Romanian-Catalan and pair upgrade

2018-03-26 Thread Marc Riera Irigoyen
Hola Hèctor,

Gràcies per la resposta, el que planteges és molt interessant. El canvi de
tenir els diccionaris monolingües a dins dels parells a tenir-los externs i
compartits entre parells ha suposat un estalvi d'esforç molt considerable
i, per sort, els parells nous i bona part dels antics fan servir aquest
sistema, però n'hi ha molts que encara no. La conversió en si no és res de
l'altre món, però el canvi dels diccionaris monolingües gairebé sempre
comporta que apareguin errors al testvoc. Si un desenvolupador (existent o
nou) vol afegir alguna cosa a un d'aquests parells, probablement es
plantegi actualitzar el parell al nou sistema, però si només vol afegir una
entrada i sap que l'actualització trencarà 5000 entrades (per exemple), no
ho veurà com una bona inversió i preferirà no fer-ho. D'aquesta manera, és
normal que hi hagi paquets que poc a poc hagin anat quedant desactualitzats.

Evidentment, la solució ideal per a Apertium seria que algú amb
coneixements de les dues llengües d'aquests parells els tractés amb amor i
es dediqués a fer l'actualització i posteriorment no només corregir el
testvoc, sinó comprovar possibles regressions i entrades "perdudes", com bé
descrius. Malauradament, la realitat és que no tothom pot dedicar tant de
temps a Apertium i per tant s'hagin de buscar altres solucions.

Com que a Apertium hi ha moltíssimes llengües, jo només en sé unes poques i
per tant no sóc la persona adequada per fer proves profundes en parells que
no conec, el que proposo és simplement fer l'actualització de com a mínim
quatre parells i assegurar-me que no hi ha errors de generació visibles. Es
podria dir que és deixar la feina a mitges, però crec que és un pas
important que pot fer que desenvolupar posteriorment aquests parells sigui
un esforç molt més assumible. No he decidit quins parells actualitzar, n'hi
ha de més senzills i de més complexos, però tinc clar que si sobrés temps
(en cas que els parells actualitzats fossin senzills) els invertiria en
actualitzar més parells, i en cas contrari miraria de compensar-ho. Sigui
com sigui, i vista també l'experiència personal, penso que l'actualització
d'aquests parells ens beneficia molt a tots.

Marc

El dia 26 de març de 2018 a les 8:39, Hèctor Alòs i Font <
h.a...@esperanto.cat> ha escrit:

> Hola Marc,
>
> La proposta em sembla excel·lent i la feina feta durant tot el darrer any
> sobre l'eng-cat i el diccionari monolingüe cat t'avalen amb escreix. El
> parell cat-ron és divertit, a part de socialment útil, com dius. Si hi ha
> una cosa a dir és sobre l'última part del projecte. Creus que amb una
> setmana per parell (una o dues direccions) serà suficient? Amb els parells
> en què només hi ha una sola direcció, verificar el traspàs només amb el
> testvoc em sembla insuficient. Per exemple, si tenim cat > epo i s'ha
> corregit una paraula en català, posem, de n.mf a dues entrades n.m i n.f,
> la traducció es perdrà i el testvoc no avisarà del problema. Amb parells en
> què els dos sentits estan oberts i, per tant, cal passar el testvoc en les
> dues direccions aquesta mena de problemes es veuran. Per això, amb casos
> com el català-italià i català-esperanto aquesta estratègia em sembla
> insuficient. Una possibilitat senzilla (i superficial) seria comparar una
> traducció feta amb el sistema actual i una amb el sistema nou. El que
> passaria fent això, és que la gran majoria dels canvis, de ben segur, seran
> per canvis en la desambiguació morfològica. De fet, no seria sobrer mirar
> si aquests canvis en la desambiguació afecten negativament alguna cosa en
> les noves versions dels parells.
>
> Salutacions cordials,
> Hèctor
>
> 2018-03-25 2:21 GMT+03:00 Marc Riera Irigoyen <
> marc.riera.irigo...@gmail.com>:
>
>> Dear Apertiumers,
>>
>> I would like to share with you my GSoC proposal to adopt the unreleased
>> Romanian-Catalan pair and upgrade several (to be defined) old pairs to the
>> monolingual module system. You can find all the details here:
>>
>> Google Docs: https://docs.google.com/document/d/1pNTebWpQJP_V_2vAJFvbhorB
>> r6dcrgjfF-O2Hc5jWak/edit?usp=sharing
>> Apertium wiki: http://wiki.apertium.org/wiki/User:Marcriera/Proposal2018
>>
>> I will be happy to receive your feedback and answer any questions you may
>> have.
>>
>> Thanks,
>>
>> Marc
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>
> -

[Apertium-stuff] GSoC 2018: Romanian-Catalan and pair upgrade

2018-03-24 Thread Marc Riera Irigoyen
Dear Apertiumers,

I would like to share with you my GSoC proposal to adopt the unreleased
Romanian-Catalan pair and upgrade several (to be defined) old pairs to the
monolingual module system. You can find all the details here:

Google Docs:
https://docs.google.com/document/d/1pNTebWpQJP_V_2vAJFvbhorBr6dcrgjfF-O2Hc5jWak/edit?usp=sharing
Apertium wiki: http://wiki.apertium.org/wiki/User:Marcriera/Proposal2018

I will be happy to receive your feedback and answer any questions you may
have.

Thanks,

Marc
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium eng-cat release

2018-03-12 Thread Marc Riera Irigoyen
>
> What kind of lexical coverage do Google/Yandex have ?
>

This text shows about 98% coverage for Google and 97% coverage for Yandex,
based on words left untranslated.

What kind of effort/work do you think needs to be done to approach Google's
> quality?
> What would you say the main needs are now ?
>

The coverage is now fairly good thanks to the work done during GSoC, even
with proper nouns, but the rules have not changed much, so a good idea
would be to improve and expand them to support more patterns (English word
order changes in questions, for example, are not supported, and they are
very frequent). Of course, more work is needed on the tagger too to ensure
the rules are applied in every compatible case.

These needs are strongly related to approaching Google's translation
quality. Google Translate, thanks to it being based on corpora, has a lot
of information about different types of texts. Even if a single text is a
mix of different styles, it can easily solve them. Apertium, however, needs
to specifically know about every possible pattern and style, something
which is not reflected in neither the corpus used for tagger training nor
the transfer rules. Hence, while it can work well for what it "knows"
about, once being given something different, it outputs funny results.

One of my main goals for the near future is to rewrite everything related
to verbs to take advantage of three-staged transfer. Most of the current
rules have seen minor modifications since the switch from one-staged to
three-staged transfer and only apply for very specific patterns; a good
rewrite should offer noticeable improvements.

2018-03-12 12:21 GMT+01:00 Francis Tyers <fty...@prompsit.com>:

> El 2018-03-12 12:10, Marc Riera Irigoyen escribió:
>
>> Have you done any evaluation ? How does it compare to other systems
>>> (and
>>> the old system too) ? :)
>>>
>>
>> The pair works fairly well with encyclopedia-like texts, and has a
>> good Wikipedia coverage (92% for English and 87% for Catalan). The
>> reference translation (an English article on Greece not used during
>> development) shows a WER/PER of 51%/35%, better than the old pair's
>> 56%/40% with the same text. Yandex is slightly better than Apertium,
>> with 56%/34%, and Google stands with the best results (43%/26%). I
>> have not really evaluated translations from Catalan (most of the
>> development has taken place in the other direction), but it should be
>> more or less the same as the old pair.
>>
>
> Good to know that we are approaching the quality of Yandex! :)
>
> What kind of effort/work do you think needs to be done to approach Google's
> quality?
>
> What kind of lexical coverage do Google/Yandex have ?
>
> While the pair still needs a lot of work and love, the rewrite has
>> eased development. With good taggers on both sides, trained with
>> diverse texts (including dialogues to reflect oral language
>> constructions), as well as a reorganization/rewrite of the transfer
>> rules (inherited from the messy old pair), we should have a very
>> decent and useful language pair.
>>
>
> What would you say the main needs are now ?
>
> Fran
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium eng-cat release

2018-03-12 Thread Marc Riera Irigoyen
>
> Have you done any evaluation ? How does it compare to other systems (and
> the old system too) ? :)
>

The pair works fairly well with encyclopedia-like texts, and has a good
Wikipedia coverage (92% for English and 87% for Catalan). The reference
translation (an English article on Greece not used during development)
shows a WER/PER of 51%/35%, better than the old pair's 56%/40% with the
same text. Yandex is slightly better than Apertium, with 56%/34%, and
Google stands with the best results (43%/26%). I have not really evaluated
translations from Catalan (most of the development has taken place in the
other direction), but it should be more or less the same as the old pair.

While the pair still needs a lot of work and love, the rewrite has eased
development. With good taggers on both sides, trained with diverse texts
(including dialogues to reflect oral language constructions), as well as a
reorganization/rewrite of the transfer rules (inherited from the messy old
pair), we should have a very decent and useful language pair.

Thanks for your support!

Marc

2018-03-11 23:59 GMT+01:00 Francis Tyers <fty...@prompsit.com>:

> El 2018-03-11 23:41, Marc Riera Irigoyen escribió:
>
>> Dear Apertiumers,
>>
>> After intense work during last year's GSoC and the following months,
>> I'm glad to announce that the apertium-eng-cat pair, currently in
>> apertium-incubator, is finally testvoc clean and ready for trunk. This
>> is a rewrite and a replacement of the original English-Catalan (en-ca)
>> pair, which was becoming increasingly out of date and hard to
>> maintain.
>>
>> The new pair includes everything the old pair did (rule-wise), but has
>> a considerably larger dix (~65,000 stems) and features several
>> innovations compared to what we had before:
>>
>> * Lexical selection rules (mainly eng>cat)
>> * Perceptron tagger for English
>> * Constraint Grammar
>> * Apertium-separable module
>>
>> These changes are important not only because of the improvements, but
>> also because Java compatibility cannot be kept (as with
>> apertium-fra-cat). As there is no possible fallback mode in the new
>> pair until these modules get ported to Java or a different approach
>> with C++/Java is taken, the best idea could be to temporarily use the
>> old pair as fallback for the new one.
>>
>> I will keep doing my best to improve the quality of this pair and use
>> it as a test bench for innovative modules and development approaches.
>>
>>
> PS. Congratulations on the release! :)
>
> F.
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008 <+34%20652%2049%2020%2008>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Apertium eng-cat release

2018-03-11 Thread Marc Riera Irigoyen
Dear Apertiumers,

After intense work during last year's GSoC and the following months, I'm
glad to announce that the apertium-eng-cat pair, currently in
apertium-incubator, is finally testvoc clean and ready for trunk. This is a
rewrite and a replacement of the original English-Catalan (en-ca) pair,
which was becoming increasingly out of date and hard to maintain.

The new pair includes everything the old pair did (rule-wise), but has a
considerably larger dix (~65,000 stems) and features several innovations
compared to what we had before:

* Lexical selection rules (mainly eng>cat)
* Perceptron tagger for English
* Constraint Grammar
* Apertium-separable module

These changes are important not only because of the improvements, but also
because Java compatibility cannot be kept (as with apertium-fra-cat). As
there is no possible fallback mode in the new pair until these modules get
ported to Java or a different approach with C++/Java is taken, the best
idea could be to temporarily use the old pair as fallback for the new one.

I will keep doing my best to improve the quality of this pair and use it as
a test bench for innovative modules and development approaches.

Marc
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-eng generation errors

2018-01-09 Thread Marc Riera Irigoyen
Thanks Fran, that makes a lot of sense. I will update some entries that do
not follow that style and should be using it.

However, what should I do with proper nouns such as "MacDonald/Macdonald"?
Both forms are used and depend on personal preferences. Adding both as
separate entries in apertium-eng and the bidix seems to work, but they pop
up during testvoc because there is no way to tell one from the other when
used in caps "MACDONALD", and Apertium does the Arctic/Arctic thing.
Keeping only "Macdonald" in apertium-eng and both entries in the bidix
fixes the all-caps translations, but "MacDonald" becomes "Macdonald" right
at the end of the pipeline, the capital D is lost during generation in
apertium-eng.

Marc

2018-01-09 9:14 GMT+01:00 Francis Tyers <fty...@prompsit.com>:

> El 2018-01-08 11:31, Marc Riera Irigoyen escribió:
>
>> Hello all,
>>
>> I've noticed some weird behaviour when generating English output in
>> the English-Catalan pair. When translating "àrtic" to English, for
>> example, the adjective "Arctic" is found, but right at the end of the
>> pipeline, in the English generator, "Arctic" becomes
>> Arctic/Arctic and is sent as is in the output text.
>>
>> The entry in the English monodix is "artic" without caps, but this was
>> the case in the old en-ca pair and it worked. Moreover, I've altered
>> the bidix to simulate the same situation when generating to Catalan
>> and it works as expected, so it looks like it's specific to English.
>>
>> Anyone who knows what's going on here?
>>
>>
>   rctic n="expensive__adj"/>
>   arctic
>
> two entries. My proposal is to keep the first one and delete the second
> one.
>
> For adjectives/nouns that must always be written in uppercase then the
> first style
> of entry should be used. This avoids generation issues when the case
> changing in
> transfer means that some open class comes out in the wrong orthographic
> case.
>
> Fran
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Apertium-eng generation errors

2018-01-08 Thread Marc Riera Irigoyen
Hello all,

I've noticed some weird behaviour when generating English output in the
English-Catalan pair. When translating "àrtic" to English, for example, the
adjective "Arctic" is found, but right at the end of the pipeline, in the
English generator, "Arctic" becomes Arctic/Arctic and is sent as is in
the output text.

The entry in the English monodix is "artic" without caps, but this was the
case in the old en-ca pair and it worked. Moreover, I've altered the bidix
to simulate the same situation when generating to Catalan and it works as
expected, so it looks like it's specific to English.

Anyone who knows what's going on here?

Thanks!

Marc
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Using in bidix

2017-11-04 Thread Marc Riera Irigoyen
Thats's what I was thinking about, sorry for my bad explanation. It would
basically be taking advantage of apertium-separable to merge everything in
the lemma and then writing the corresponding entries in the bidix.

2017-11-04 17:25 GMT+01:00 Francis Tyers <fty...@prompsit.com>:

> El 2017-11-04 16:58, Marc Riera Irigoyen escribió:
>
>> I was thinking something like merging "casar-se" and "amb" using
>> apertium-separable to get only "marry" when translating into English
>> (to avoid writing a long list of verbs using "amb" in the transfer
>> rules. As I said, it's just a crazy idea I came up with when thinking
>> about things that currently need to be handled better.
>>
>
> Yeah, that's fine, but the way I would do it would not be to get rid
> of the amb, but rather move it to the lemma, e.g.
>
> IN: ^es$ ^casa$ ^amb$
> OUT: ^casar-se# amb$
>
> :)
>
> Fran
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Apertium-ca-ro

2017-11-04 Thread Marc Riera Irigoyen
Hello all,

I have noticed that there is a Romanian-Catalan pair in the nursery that
has not been updated for some time. From what I have guessed from SVN
history, it was based on the released Romanian-Spanish pair and probably
took advantage of crossdics. While the pair has around 100 rules, it only
works from Romanian to Catalan, there are no rules for the other direction,
and it is full of generation errors. It is also self-contained, it does not
make use of standalone language modules.

Even though I am currently working on English-Catalan to release an update
as soon as possible, I think it would be great to work on this pair to
bring it up to date. Given that Romanian and Catalan are both Romance
languages with lots of similarities, that there is a good amount of
documentation about Romanian in the Apertium wiki (even transfer rules) and
that I have studied Romanian and can speak it to a certain extent, it
should not be a difficult task and we could get a very decent language pair.

This is a basic idea with no further planning, and depending on my spare
time it may not become real until next year's GSoC, but it would be great
to hear about your thoughts or even meet someone interested or who took
part in the development of the Romanian-Spanish pair.

Thanks and see you!

Marc

-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Comments in transfer macros

2017-08-17 Thread Marc Riera Irigoyen
Hello,

While discussing possible ways of documenting transfer rules for the
English-Catalan pair, we came up with the idea of writing rule
documentation directly to the T1X, T2X and T3X files to then build wiki
pages automatically with the data. You can check an example (work in
progress) here:
http://wiki.apertium.org/wiki/English_and_Catalan/Transfer_Rules

However, even though rules were allowed to have a specific "comment" for
this purpose, macros were not, and Apertium refused to compile transfer
files with comments in macros. I have now added support for optional
comments in the macros too (just like it was done some time ago with
comments in the rules). This tiny change will not affect anything else, and
will hopefully encourage other developers to write better documentation for
language pairs.

Regards,

Marc
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-25 Thread Marc Riera Irigoyen
Add the  tag without a blank between it and the previous lemma:


  

  
  






  

  

  


This should do the trick.

2017-07-25 12:46 GMT+02:00 Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de>:

> [Replies inline]
> On Mon, 24 Jul 2017 23:15:15 +0200
> Marc Riera Irigoyen
> <marc.riera.irigo...@gmail.com> wrote:
>
> > Sorry, I made a mistake in the reference bidix entry in the last
> > message that could break pairs already using genitives. The good
> > entry should be like this (what I originally said in a previous
> > message):
> >
> > 's
> >
> > I have tweaked the CG rules and now everything should work even
> > better. Thanks!
>
> Thanks for the update. I have changed fin-eng a bit and set up some
> tests using Apertium's canonical standard testing methods like[1]. The
> only issue I have is generating, I use:
>
> 
>   
> 
> 
>   
>   
> 
>   
> 
> 
>   
> 
>   
> 
> 
>   
>   
>   
>   
>   
>   
>   
> 
> 
> 
> 
> 
>   
> 
>   
>   
>
> But I seem to get:
>
> fin-eng   Jamesin kissa.
> - James' cat.
> + James's/' cat.
>
>
> fin-eng   Jackin kissa.
> - Jack's cat.
> + Jack's/' cat.
>
>
> fin-eng   pojan kissa.
> - boy's cat.
> + boy's/' cat.
>
> Other way it goes almost right, except for the lexical selection:
>
> eng-fin   James' cat.
> - Jamesin kissa.
> + James’ kolli.
>
>
> eng-fin   Jack's cat.
> - Jackin kissa.
> + Jackin kolli.
>
>
> eng-fin   boy's cat.
> - pojan kissa.
> + pojan kolli.
>
>
> [1]
> <http://wiki.apertium.org/wiki/Apertium-fin-eng/Pending_tests#Genitives>
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> <https://flammie.github.io/purplemonkeydishwasher/>, Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> <http://gtweb.uit.no/sigur/>.
> I tend to follow inline-posting style in desktop e-mail messages.
>
>
>
> ----
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-24 Thread Marc Riera Irigoyen
Sorry, I made a mistake in the reference bidix entry in the last message
that could break pairs already using genitives. The good entry should be
like this (what I originally said in a previous message):

's

I have tweaked the CG rules and now everything should work even better.
Thanks!

2017-07-24 13:03 GMT+02:00 Marc Riera Irigoyen <
marc.riera.irigo...@gmail.com>:

> The changes are now live. I've finally not included the apostrophe form in
> the paradigms; instead, it's a form of the 's lemma that is removed via CG
> when the previous lemma is not a noun or proper noun ending in "s". I'm
> currently training the apertium-eng tagger, so I'll add some forbids
> related to genitives in the TSX files to rely less on CG.
>
> In order to add support for genitives in any pair using apertium-eng,
> simply add the following entry (change LR to RL if English is on the right
> side of the bidix):
>
> 
>
> To generate the genitive form of any noun or proper noun, add "+'s"
> after the noun. The correct form of the genitive will be generated
> automatically. Example:
>
> house+'s = house's
> house+'s = houses'
>
> Thank you!
>
> 2017-07-19 0:35 GMT+02:00 Marc Riera Irigoyen <
> marc.riera.irigo...@gmail.com>:
>
>> Fran, I agree with your suggestion, I'll do it this way.
>>
>> Flammie, the only change you need to do in the bidix is to add an entry
>> like this:
>>
>> 's
>>
>> This will allow your pair to get a  lemma after the word in genitive
>> that can be used in transfer rules when translating from English. The other
>> way round is simpler, just make the rules add an extra  tag to a lemma
>> to get its genitive form (e.g. house will show up as "house's").
>>
>> I will wait until the end of the week (in case someone else has any
>> suggestion) and notify you all when I commit the changes. Thanks!
>>
>> 2017-07-18 20:34 GMT+02:00 Flammie Pirinen <flam...@iki.fi>:
>>
>>> 2017-07-18, Francis Tyers sanoi:
>>>
>>> > El 2017-07-18 14:44, Marc Riera Irigoyen escribió:
>>> > > Hello! I'm working on apertium-eng-cat and I've been having some
>>> > > issues with genitives. Currently, many noun paradigms in
>>> > > apertium-eng (mostly irregular nouns) have a  form with the
>>> > > genitive. Not all nouns have it, so apertium-eng also includes a
>>> > > specific lemma for the regular genitive ('s) to add the  tag
>>> > > in the analysis.
>>> > >
>>> > > However, when using apertium-eng for generation, it is impossible to
>>> > > know in advance whether or not a lemma has the genitive form, so
>>> > > generation for lemmas without the  form is broken.
>>> > >
>>> > > I'm considering adding a  form to all the affected paradigms to
>>> > > fix this issue. Any objections?
>>> >
>>> > Are there other pairs relying on apertium-eng ? Would this break
>>> > them ?
>>>
>>>
>>> I’ve genitives and apertium-eng set up for e.g. fin-eng, but I’m ok with
>>> whichever solution, if you can post a bidix formula for that once
>>> you’ve updated the eng.
>>>
>>> --
>>> Flammie, computer scientist bachelor + linguist master = computational
>>> linguist doctor, free software Finnish localiser,
>>> and more! <http://www.iki.fi/flammie/>
>>>
>>>
>>>
>>> ----
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
>>
>>
>> --
>>
>> *Marc Riera Irigoyen*
>> Freelance Translator EN/JA>CA/ES
>>
>> (+34) 652 492 008 <+34%20652%2049%2020%2008>
>>
>
>
>
> --
>
> *Marc Riera Irigoyen*
> Freelance Translator EN/JA>CA/ES
>
> (+34) 652 492 008 <+34%20652%2049%2020%2008>
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-24 Thread Marc Riera Irigoyen
The changes are now live. I've finally not included the apostrophe form in
the paradigms; instead, it's a form of the 's lemma that is removed via CG
when the previous lemma is not a noun or proper noun ending in "s". I'm
currently training the apertium-eng tagger, so I'll add some forbids
related to genitives in the TSX files to rely less on CG.

In order to add support for genitives in any pair using apertium-eng,
simply add the following entry (change LR to RL if English is on the right
side of the bidix):



To generate the genitive form of any noun or proper noun, add "+'s"
after the noun. The correct form of the genitive will be generated
automatically. Example:

house+'s = house's
house+'s = houses'

Thank you!

2017-07-19 0:35 GMT+02:00 Marc Riera Irigoyen <marc.riera.irigo...@gmail.com
>:

> Fran, I agree with your suggestion, I'll do it this way.
>
> Flammie, the only change you need to do in the bidix is to add an entry
> like this:
>
> 's
>
> This will allow your pair to get a  lemma after the word in genitive
> that can be used in transfer rules when translating from English. The other
> way round is simpler, just make the rules add an extra  tag to a lemma
> to get its genitive form (e.g. house will show up as "house's").
>
> I will wait until the end of the week (in case someone else has any
> suggestion) and notify you all when I commit the changes. Thanks!
>
> 2017-07-18 20:34 GMT+02:00 Flammie Pirinen <flam...@iki.fi>:
>
>> 2017-07-18, Francis Tyers sanoi:
>>
>> > El 2017-07-18 14:44, Marc Riera Irigoyen escribió:
>> > > Hello! I'm working on apertium-eng-cat and I've been having some
>> > > issues with genitives. Currently, many noun paradigms in
>> > > apertium-eng (mostly irregular nouns) have a  form with the
>> > > genitive. Not all nouns have it, so apertium-eng also includes a
>> > > specific lemma for the regular genitive ('s) to add the  tag
>> > > in the analysis.
>> > >
>> > > However, when using apertium-eng for generation, it is impossible to
>> > > know in advance whether or not a lemma has the genitive form, so
>> > > generation for lemmas without the  form is broken.
>> > >
>> > > I'm considering adding a  form to all the affected paradigms to
>> > > fix this issue. Any objections?
>> >
>> > Are there other pairs relying on apertium-eng ? Would this break
>> > them ?
>>
>>
>> I’ve genitives and apertium-eng set up for e.g. fin-eng, but I’m ok with
>> whichever solution, if you can post a bidix formula for that once
>> you’ve updated the eng.
>>
>> --
>> Flammie, computer scientist bachelor + linguist master = computational
>> linguist doctor, free software Finnish localiser,
>> and more! <http://www.iki.fi/flammie/>
>>
>>
>>
>> --------
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
>
> --
>
> *Marc Riera Irigoyen*
> Freelance Translator EN/JA>CA/ES
>
> (+34) 652 492 008 <+34%20652%2049%2020%2008>
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-18 Thread Marc Riera Irigoyen
Fran, I agree with your suggestion, I'll do it this way.

Flammie, the only change you need to do in the bidix is to add an entry
like this:

's

This will allow your pair to get a  lemma after the word in genitive
that can be used in transfer rules when translating from English. The other
way round is simpler, just make the rules add an extra  tag to a lemma
to get its genitive form (e.g. house will show up as "house's").

I will wait until the end of the week (in case someone else has any
suggestion) and notify you all when I commit the changes. Thanks!

2017-07-18 20:34 GMT+02:00 Flammie Pirinen <flam...@iki.fi>:

> 2017-07-18, Francis Tyers sanoi:
>
> > El 2017-07-18 14:44, Marc Riera Irigoyen escribió:
> > > Hello! I'm working on apertium-eng-cat and I've been having some
> > > issues with genitives. Currently, many noun paradigms in
> > > apertium-eng (mostly irregular nouns) have a  form with the
> > > genitive. Not all nouns have it, so apertium-eng also includes a
> > > specific lemma for the regular genitive ('s) to add the  tag
> > > in the analysis.
> > >
> > > However, when using apertium-eng for generation, it is impossible to
> > > know in advance whether or not a lemma has the genitive form, so
> > > generation for lemmas without the  form is broken.
> > >
> > > I'm considering adding a  form to all the affected paradigms to
> > > fix this issue. Any objections?
> >
> > Are there other pairs relying on apertium-eng ? Would this break
> > them ?
>
>
> I’ve genitives and apertium-eng set up for e.g. fin-eng, but I’m ok with
> whichever solution, if you can post a bidix formula for that once
> you’ve updated the eng.
>
> --
> Flammie, computer scientist bachelor + linguist master = computational
> linguist doctor, free software Finnish localiser,
> and more! <http://www.iki.fi/flammie/>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Genitives in apertium-eng

2017-07-18 Thread Marc Riera Irigoyen
I hadn't considered the other uses, but I've come up with a better solution
that takes them into consideration and shouldn't break pairs relying on
apertium-eng.

Basically, the genitive form would be added as generation-only for
paradigms with a regular ('s) genitive. The would allow apertium to detect
the 's as both a genitive and a verb during analysis, and at the same time
provide an easy way to generate the genitive when needed. Example:


   
  's
 s 
 s'


Verb contractions are currently limited to pronoun+verb combinations, so
I'd have to add 's independently as both a form of is and has. Nothing
would be removed, only added, so other pairs should be unaffected.

2017-07-18 17:10 GMT+02:00 Francis Tyers <fty...@prompsit.com>:

> El 2017-07-18 14:44, Marc Riera Irigoyen escribió:
>
>> Hello! I'm working on apertium-eng-cat and I've been having some
>> issues with genitives. Currently, many noun paradigms in apertium-eng
>> (mostly irregular nouns) have a  form with the genitive. Not all
>> nouns have it, so apertium-eng also includes a specific lemma for the
>> regular genitive ('s) to add the  tag in the analysis.
>>
>> However, when using apertium-eng for generation, it is impossible to
>> know in advance whether or not a lemma has the genitive form, so
>> generation for lemmas without the  form is broken.
>>
>> I'm considering adding a  form to all the affected paradigms to
>> fix this issue. Any objections?
>>
>
> Are there other pairs relying on apertium-eng ? Would this break them ?
>
> Otherwise, I don't see a problem, go ahead. Although I'd make it like
>
> house:house
> house:houses
> house+s:house's
> house+s:houses'
>
> Instead of:
>
> house:house
> house:houses
> house:house's
> house:houses'
>
> Note  that would will also probably need to include the other uses of 's
> too (is / has):
>
> e.g. "The cat's around the corner", "That cat's had his last fish supper".
>
> F.
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Genitives in apertium-eng

2017-07-18 Thread Marc Riera Irigoyen
Hello! I'm working on apertium-eng-cat and I've been having some issues
with genitives. Currently, many noun paradigms in apertium-eng (mostly
irregular nouns) have a  form with the genitive. Not all nouns have
it, so apertium-eng also includes a specific lemma for the regular genitive
('s) to add the  tag in the analysis.

However, when using apertium-eng for generation, it is impossible to know
in advance whether or not a lemma has the genitive form, so generation for
lemmas without the  form is broken.

I'm considering adding a  form to all the affected paradigms to fix
this issue. Any objections?


Marc Riera
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Coding challenge SVN commit access

2017-04-04 Thread Marc Riera Irigoyen
Hello,

After successfully submitting my GSoC proposal, I have been asked to submit
my coding challenge. I actually finished it before the proposal, but I have
no access to commit to Apertium SVN. It would be great if I was granted
access.

My SourceForge username is marcriera.

Thank you!

Marc

-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] My GSoC 2017 proposal

2017-04-01 Thread Marc Riera Irigoyen
Thank you very much for your feedback, it has helped me improve my proposal.

I agree with you that the estimates were low, I have made them higher and
calculated an estimated coverage goal based on them. I have also added more
details about the planned tasks to answer your questions (I just had not
thought enough about the whole process of adding stems and rules).

I think that the English tagger is good enough at the present and I have
not considered training it, but I am aware that the Catalan tagger fails
more frequently, so I have included some time in my work plan to train it
(if necessary). New stems will be added by frequency, and the priority of
new rules will be decided based on error frequency when testing the
corpora. CG may not be necessary at all, but if I finally need it I will
add it the same way.

Regards,

Marc

2017-03-30 15:58 GMT+02:00 Joonas Kylmälä <j.kylm...@gmail.com>:

> On 3/30/17, Francis Tyers <fty...@prompsit.com> wrote:
> > There are parallel corpora for English and Catalan, are you planning to
> > learn lexical selection rules ?
>
> Oh wow, I thought this wasn't implemented yet. I found the related
> wiki article: <http://wiki.apertium.org/wiki/Learning_rules_from_
> parallel_and_non-parallel_corpora>.
> Thanks Fran!
>
> -Joonas
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>



-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] My GSoC 2017 proposal

2017-03-30 Thread Marc Riera Irigoyen
Hello everyone,

I have been working on my proposal for this year's GSoC and I have
published a first version of it on the wiki. You can find it here:
http://wiki.apertium.org/wiki/User:Marcriera/proposal

It would be great to get some feedback about it. The workplan is not final,
as I am working on the coding challenge and it will be based on the results.

Thank you!

Marc Riera

-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] SVN Permission Request

2017-03-21 Thread Marc Riera Irigoyen
Hello, my name is Marc and I am working on my proposal for GSoC. I am
working on the en-ca and eng-cat language pairs and I have already made
some changes locally.

I would like to push the changes to SVN, so I would need permission to
contribute to SVN. My username at SourceForge is "marcriera".

Thank you very much,


Marc Riera

-- 

*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008 <+34%20652%2049%2020%2008>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff