Re: [Apertium-stuff] Universitat d'Alacant vs Universidad de Alicante

2021-03-13 Thread Jaume Ortolà i Font
Bon dia, Gonzalo.

Els diccionaris d'Apertium han anat ampliant-se segons els encàrrecs de
diferents institucions (Generalitat, universitats, etc.). Cada institució
té unes exigències d'acord amb les seues necessitats i el seu llibre
d'estil.

En el cas de la Universitat de València, el nom oficial és únic en valencià
i la mateixa universitat exigeix que s'escriga sempre així.

Si la denominació de la Universitat d'Alacant és bilingüe, sembla raonable
que es faça la traducció Universitat d'Alacant>Universidad de Alicante
(cat>spa). Si no hi ha cap inconvenient, jo mateix introduiré el canvi.

Salutacions,
Jaume Ortolà


Missatge de Gonzalo Cao Cabeza de Vaca  del dia dv., 12
de març 2021 a les 16:13:

> Hola,
>
> soc usuari des de fa molts anys dels serveis d'Apertium i n'hi ha un
> problema amb el qual he de lidiar diàriament. Quan tradueixes de castellà a
> valencià el terme "Universidad de Alicante" si que obtens "Universitat
> d'Alacant" però a la inversa no ho fa, manté "Universitat d'Alacant".
>
> En la meua tasca de traducció de textos i notícies és un mal de cap
> constant, perquè la nostra universitat té una denominació oficial en
> castellà, i per tindre una correcta traducció de valencià a castellà he de
> substituir manualment en quantitat d'ocasions aquesta denominació.
> Múltiples aparicions per text, múltiples textos al dia.
>
> Ho considere una incorrecció, veig que altres universitats del mateix
> àmbit com la de València tampoc no es tradueixen. Quina és la raó d'això?
> No seria més correcte traduir especialment si hi ha un nom oficial?
>
> Salutacions
>
> --
> Gonzalo Cao Cabeza de Vaca
> Secretariat de Promoció Cultural i Lingüística de la UA
> Apt. cor. 99. 03080 Alacant tel. 965903400 - ext 2391
> Cultura en la UA en la web , Facebook
> , Twitter
>  i Instagram
> 
> LlengüesUA en Facebook , Twitter
>  i Instagram
> 
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A talk evaluating Apertium

2020-10-19 Thread Jaume Ortolà i Font
Missatge de Hèctor Alòs i Font  del dia dl., 19
d’oct. 2020 a les 20:07:

> El dl., 19 oct. 2020, 19.58, Xavi Ivars  va
> escriure:
>
>> Well, that's only "part" of the corpus... and for the Europarl, that part
>> of corpus was not left "as is" after Apertium, but also postedited.
>>
>
> Wow! Did you postedited the whole Europarl corpus?! No matter if you used
> Apertium or not, it's clear that you did tons of work. If it is explained
> somewhere how Softcatalà did the work, with how much resources (time,
> volunteers, money), please let us know. It has to be an excellent test case
> to show wether a (real) under-resourced language can or cannot reach the
> stuff needed for neural translation.
>

No. The corpus was not postedited. It has 2 million sentences. I tried to
get a Catalan translation as good as possible. What I did was:

- Try to cover all relevant vocabulary: all non-capitalized words that
appear at least 4-5 times in the corpus.
- Fix spelling and grammar errors in the Spanish corpus using LanguageTool
(for example, missing diacritics or agreement errors). The Spanish text is
worse than expected.
- Fix many common errors in spa-cat Apertium translation.

This work is not complete. To finish it, we'll need probably 3-4 months of
full-time work or more. Anyway, a neural translator can work even if a
percentage of the corpus is not perfect.

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A talk evaluating Apertium

2020-10-18 Thread Jaume Ortolà i Font
Missatge de Hèctor Alòs i Font  del dia dg., 18
d’oct. 2020 a les 7:50:

> Xavi, I am impressed that you could in Softcatalà get enough bilingual
> texts to create an English-Catalan neural translator. Congratulations on
> the results! I am curious to know how big the corpus you collected has
> been, as well as from which sources to ensure the quality of the
> translations.
>

The corpora used can be found here:
https://github.com/Softcatala/en-ca-corpus

One of the corpora is an automatic translation of the English-Spanish
Europarl corpus using Spanish-Catalan Apertium. It has proved good enough
to train the neural translator.

The neural translator could be improved with better corpora and using more
powerful hardware in the training. The vocabulary size is limited because
of hardware constraints.


> I'd maybe add that probably it would not be possible to collect such a
> corpus for Valencian Catalan, so I guess we face in this neural translator
> a typical problem with lesser-user languages/varieties. If it is ever
> considered necessary to generate Valencian, this will have to be done by
> translating it into "reference" Catalan and then automatically adapting it.
> In fact the same happens for the many flavours we currently have in
> Apertium for Catalan, both Valencian and "Catalonian".
>

It is easy to make a Catalan>Valencian adapter (a few lines of code using
LanguageTool). Not so easy the other way around because some Valencian
verbal forms are ambiguous.


> By the way, is Softcatalà trying to create a neural translator for the
> Spanish-Catalan pair?
>

Not yet. Neural translators require a lot of hardware resources, in
training and in production. We could not support the current volume of
Spanish-Catalan translations with neural translation.

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-09 Thread Jaume Ortolà i Font
Missatge de Xavi Ivars  del dia dc., 9 de set. 2020 a
les 13:53:

> Missatge de Tanmai Khanna  del dia dc., 9 de
> set. 2020 a les 11:34:
>
>> Hey guys,
>> I'm writing a system demonstration to be submitted at LowResMT 2020 about
>> the recent project that was done as part of GSoC, titled "Translating the
>> internet into low resource languages with Apertium" (Accepting snazzier
>> title suggestions).
>>
>> As part of this demonstration, I want to show some real world examples of
>> how the new system of markup handling will help the translation of webpages
>> and formatted documents - odt, pptx, rtx, etc. To show this effectively, I
>> need to choose 3-4 released language pairs that are sufficiently
>> syntactically divergent that they show the effect of markup reordering in
>> the translation output. As far as I know, spa-cat is one of our most mature
>> pairs, however I'm not sure how syntactically divergent it is. If it is,
>> then I'm happy to be corrected. If your language pair has had issues with
>> webpage translation and those issues are now solved (ish), then some
>> examples would be really helpful.
>>
>>
> Spanish and Catalan are very similar in terms of syntax. We could
> definitely try to get examples of where diverge the most, but those
> examples would need to be completely synthetic.
>
> Markup handling helps, though, in markup handling on different areas: some
> formats where inline tags are common (like ODT), previous
> formatter/deformatter was splitting words where tags appeared, so
> translation of those has improved quite a lot.
>

Spanish and Catalan diverge syntactically in the use of some prepositions.
For example, *de que* (spa) usually becomes *que* (cat).

The last days there has been some oscillation in the translations when a
quotation mark is just in the middle of *de "que*. See:
https://github.com/apertium/apertium-spa-cat/commit/f6f7ee2f560b7ae817ac87c33278a5c4354090dc

But I'm not sure if there has been a real improvement or not. I have to
look at it more carefully.

Jaume
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Update about superblanks in transfer

2020-08-30 Thread Jaume Ortolà i Font
Missatge de Tino Didriksen  del dia dg., 30 d’ag.
2020 a les 11:15:

> Why is - a blank in the first place? If it's needed in contexts, it should
> be fully analyzed as a token.
> This goes for all Apertium languages and pairs. I don't understand why
> punctuation generally isn't analyzed. I assume it's just historic.
>

There are pros and cons. For instance, If you analyze a quotation mark (")
as a token, you need to adjust every disambiguation rule where the quote
can appear (which is everywhere, in fact), and that can be very annoying.

I don't have a definitive answer. My guess (in the languages I am familiar
with) is that most punctuation marks should interrupt the analysis, except
for quotation marks, which should not (with some exceptions in turn).

Are the changes being implemented going to alter the behavior of the
punctuation marks that are not analyzed as tokens?

Jaume
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] ADDCOHORT in Constraint Grammar

2020-05-28 Thread Jaume Ortolà i Font
>
> Isn't this something that should go in transfer,


Dropping this "que" is possible in Spanish, but it is not regular syntax,
it is a mannerism used in bureaucratic jargon. The regular syntax is with
"que". It makes sense to add it, so all language pairs can translate as
usual.

Transfer is extremely annoying for this kind of things, in my experience.


> or you could
> use apertium-separable for it?
>

Probably yes. We are not using apertium-separable in spa-cat, and it will
be useful to do it.

Jaume

Fran
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] ADDCOHORT in Constraint Grammar

2020-05-28 Thread Jaume Ortolà i Font
Missatge de Tino Didriksen  del dia dj., 28 de maig
2020 a les 11:02:

> Not currently possible, but that's certainly needed. I've created
> https://github.com/TinoDidriksen/cg3/issues/58 for tracking / discussion.
>

Thanks, Tino. For the time being, I will add an entry "que" to the
dictionaries.

Jaume Ortolà

On Thu, 28 May 2020 at 10:52, Jaume Ortolà i Font 
wrote:

> Hi,
>
> I'm using ADDCOHORT to add a word "que" in the Spanish sentence "ruego me
> perdonen".
>
> ADDCOHORT:ruego_que ("" "que" cnjsub) AFTER (vblex) (0 VerbsRogar) (0
> (p1)) (1 Vblex + Pers);
>
> $ echo "Ruego me perdonen" | apertium -d . spa-cat
> Prego queem perdonin
>
> It works fine, but a white space needs to be added after "que". How can it
> be done? How can I add a white space to the metadata?
>
> Jaume Ortolà
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] ADDCOHORT in Constraint Grammar

2020-05-28 Thread Jaume Ortolà i Font
Hi,

I'm using ADDCOHORT to add a word "que" in the Spanish sentence "ruego me
perdonen".

ADDCOHORT:ruego_que ("" "que" cnjsub) AFTER (vblex) (0 VerbsRogar) (0
(p1)) (1 Vblex + Pers);

$ echo "Ruego me perdonen" | apertium -d . spa-cat
Prego queem perdonin

It works fine, but a white space needs to be added after "que". How can it
be done? How can I add a white space to the metadata?

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Lexical Selection

2020-04-03 Thread Jaume Ortolà i Font
Missatge de egea piñeiro helena  del dia dc., 1
d’abr. 2020 a les 10:48:

> How to show the text translated with the multiple options due to polisemy.
> "The *season/station* more rainy is"
>

This is a recurrent request, that could be useful in some applications, but
there is no way to do it in Apertium now.


> Also, to study this, I'm trying to find out how to make some changes on
> the lexical rules (.lrx files) and see an actual change on the output, but
> the commands through the pipeline doesn't seem to refer to that file. All
> that I can change to test different polisemy cases are the 'srl' and 'lrs'
> on the dictionary  ammong the 'D' option (default). I assumed the
> autloex.bin somehow called the lrx file but I don't know for sure and I
> couldn't find information so far.
>

You can also see examples of lexical selection rules in the spa-cat metalrx
files. In eng-cat, Marc Riera can tell you how it is done.

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] odt translation not working

2019-10-21 Thread Jaume Ortolà i Font
Hi,

I'm using the nightly version of Apertium, and the translation of ODT files
is not working.

This does nothing. The result is the same file unchanged:
$apertium -d . -f odt spa-cat test.odt test-cat.odt

It seems to happen with any language pair and only with ODT files.

I'm not the only one experiencing this issue. Other people pointed it out
to me.

Jaume
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] quotation marks in CG

2019-10-19 Thread Jaume Ortolà i Font
Missatge de Tino Didriksen  del dia ds., 19
d’oct. 2019 a les 16:58:

> Potentially, with
> https://visl.sdu.dk/cg3/chunked/tags.html#stream-metadata
>

Thanks! It works.

Jaume


On Sat, 19 Oct 2019 at 16:43, Jaume Ortolà i Font 
> wrote:
>
>> Hi,
>>
>> When there are quotation marks in a sentence you want to translate, most
>> times it's better to ignore them and do all the process as if there are no
>> quotation marks at all. But a few times it's necessary to check if there
>> are quotation marks.
>>
>> In the current spa-cat translation, we don't assign a token to quotation
>> marks and don't tag them. If we did, there would be many things to
>> change along the translation pipeline.
>>
>> Is there any way to "have my cake and eat it"? In Constraint Grammar can
>> I check if there is a quotation mark at some point but without it being a
>> proper token?
>>
>> Jaume Ortolà
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] quotation marks in CG

2019-10-19 Thread Jaume Ortolà i Font
Hi,

When there are quotation marks in a sentence you want to translate, most
times it's better to ignore them and do all the process as if there are no
quotation marks at all. But a few times it's necessary to check if there
are quotation marks.

In the current spa-cat translation, we don't assign a token to quotation
marks and don't tag them. If we did, there would be many things to change
along the translation pipeline.

Is there any way to "have my cake and eat it"? In Constraint Grammar can I
check if there is a quotation mark at some point but without it being a
proper token?

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Adding invariant prefixes

2019-09-18 Thread Jaume Ortolà i Font
Missatge de Kevin Brubeck Unhammer  del dia dc., 18 de
set. 2019 a les 12:02:

> > I have tried adding a mark to the newly formed words and removing it with
> > CG if necessary. It works fine.
>
> Why not keep it all the way through the translator? That seems safer to
> me, and you don't have to worry that they may not be synonymous.
>

Some of these words can be very difficult to disambiguate (and to foresee):
prerrogativa (noun) vs. pre + (r)rogativa (adj). I found them because they
caused translation errors. So it is safer to remove the analysis with
prefix, and keep the original POS tags in the dictionary.
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Adding invariant prefixes

2019-09-18 Thread Jaume Ortolà i Font
Thanks for the answers.

Missatge de Jonathan Washington  del dia
dt., 17 de set. 2019 a les 22:11:

> Jaume, are you planning on using this for translation or something else?
> If for translation, how do you anticipate it improving translation quality?
>

These prefixes will be used for translating spa-cat, and they could be used
also for other Romanic languages pairs. Hèctor Alòs is interested in it.

I have tried the first option proposed by Kevin with just adjectives and
some prefixes in Spanish:


  anti
  pro
  post
  pospost
  pre
  



  antiranti
  prorpro
  postpost
  prerpre
  antianti
  propro
  pospost
  prepre
  


In the Europarl corpus it finds around one new word (untranslated so far)
every 5000 sentences. A few more prefixes can be added, and the same would
be done with nouns and verbs.

We'll need to create metadix files so that the dictionaries don't become
cluttered with the new tags. The metadix will be useful also for other
things.

Some new words formed with prefixes can match existing words. All these
should be discarded beforehand.
prefiero (verb) = pre + fiero (adj)
presumo (verb) = pre + sumo (adj)
prerrogativa (noun) = pre + (r)rogativa (adj)

I have tried adding a mark to the newly formed words and removing it with
CG if necessary. It works fine.

pre-prefix-pre

REMOVE:prefixes ("-prefix-.*"r) IF (0 ("-prefix-.*"r));

I think adding this feature is productive and worthwhile. What do you think
(Hèctor, Marc, Xavi...)?
Any suggestion to improve it?

Jaume
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Adding invariant prefixes

2019-09-17 Thread Jaume Ortolà i Font
Hi,

I would like to be able to translate automatically certain words formed by
"a certain prefix + a certain POS" without having to add new entries to the
dictionaries. For example, any word formed by "anti" + any valid adjective
in translations spa<>cat:

antihúngaro <> antihongarès
antihúngaras <> antihongareses
antialemán <> antialemany
antipluvial <> antipluvial
antiestatista <> antiestatista
...

The word forms and the POS tags would remain unchanged. (But in some
languages some spelling changes may be necessary. In Spanish: "anti + ruso
" becomes antirruso.)

This feature could be used in a lot of language pairs. Has it been
implemented anywhere? How could it be done?

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Stop merging lines

2019-09-06 Thread Jaume Ortolà i Font
Did you found any solution for the merging lines?

I found this problem translating the Europarl corpus (spa>cat). This
problem didn't happen before, but it happens now with the current nightly
version. It turns out that the sentences where the merging occurs (or
starts?) contain a 'soft hyphen' character (U+00AD). Removing this
character (in fact, it should be replaced by an em dash), there is no
merging.

Another change in behavior I have noticed is related to characters not in
the language alphabet.
Before it was: Kwaśniewski > *Kwaś*niewski
Now it is: Kwaśniewski > *Kwaśniewski
Which is preferable.

Jaume Ortolà
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Sentir i sentar

2019-01-23 Thread Jaume Ortolà i Font
En la meva línia d'ordres amb l'última versió que hi ha en GitHub, la frase
es tradueix bé. Hi ha una regla de CG que analitza sempre com a "sentir"
llevat d'uns pocs casos excepcionals en què agafa "sentar". És la solució
menys dolenta que vaig trobar.

En beta.apertium.org, softcatala.org/traductor i salt.gva.es/ca/traductor
es fa malament ara mateix. Veig que els canvis d'aquesta regla són de fa
relativament poc, un mes.[1] No deuen haver arribat encara a les webs.

Per altra part, sí que hi ha moltes altres regles de CG que van bé. "Dos
años y medio" es tradueix malament en apertium.org, però bé en els altres
llocs que he esmentat.

La diferència que hi ha ara entre spa-cat en producció i en desenvolupament
és excessivament gran (més d'un any amb molts canvis). Caldria publicar la
branca 'stable' que hi ha en GitHub.

Atentament,
Jaume Ortolà

[1]
https://github.com/apertium/apertium-spa/commit/24d8c8c91abf86b076ff0a7b405081cbd9daec66



Missatge de Hèctor Alòs i Font  del dia dc., 23 de
gen. 2019 a les 10:40:

> Ha de ser un problema en la desambiguació. Diria que en la desambiguació
> del català eliminem. via CG, la possibilitat el subjuntiu si no hi ha un
> "que" davant del verb a la frase. Suposo que en castellà caldria fer el
> mateix. El que no sé és perquè funcionava abans i ara no.
>
> Missatge de Joan Moratinos Jaume  del dia dc., 23
> de gen. 2019 a les 12:33:
>
>> Veig que la versió beta tradueix malament la frase "Narra a la audiencia
>> cómo se siente y qué piensa" (spa-cat). La versió de producció ho tradueix
>> bé. Sabeu què passa?
>>
>> --
>> Joan Moratinos
>> jmorati...@gmail.com
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] LRX and RLX syntax questions

2018-05-23 Thread Jaume Ortolà i Font
Hi,

In the LRX file I have noticed that this:



matches any verb containing the string "ver": ver, devolver, revolver,
verter, versionar, versar... (72 verbs in Spanish).

Although this can be useful in a few cases (I have used it once), we
definitely need a way to match only "ver".  How can it be done? Is there a
bug in the program to be fixed?

On the other hand, I have some questions about Constraint Grammar syntax. I
am not able to fully understand the manual. A few examples will be very
useful for me. With (*/* "ver") we can match a lemma anywhere in the
sentence. What is the syntax for...

- matching something in 4 tokens around a word (that is to say, positions
-4, -3, -2 -1, 1, 2, 3, 4)?
- matching something in 4 tokens at the left or at the right of a word?

Thanks,
Jaume
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] difficult mach in LRX file

2018-04-11 Thread Jaume Ortolà i Font
Hi,

I want to match a token like this:

^Dar# cuenta+te$

in a lexical selection rule (LRX file).

I need to check the tags on the right ("prn.enc.p2.mf.sg"). Is it
possible? What is the syntax to do it?

Jaume Ortolà
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Dealing with line endings

2018-04-06 Thread Jaume Ortolà i Font
Hi,

We are having some undesired differences in git commits because of line
endings[1]. Depending on the operating system and the text editor you use,
the newline character may end up being changed.

Can the repository configuration be modified to avoid this issue? Or can
somebody recommend a common approach to be followed by all Apertium
contributors? Some options here [2].

Jaume

[1]
https://github.com/apertium/apertium-spa/commit/0e766bdd56b7505677c2c28f1beb4600b4893508
[2] https://help.github.com/articles/dealing-with-line-endings/
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] the 'mango' issue

2017-07-02 Thread Jaume Ortolà i Font
2017-07-02 19:59 GMT+02:00 Francis Tyers :

>
> There are two possibilities:
>
> 1) you could mark them in CG, e.g.
>
> SUBSTITUTE ("mango") ("mango¹") (0* MangoWords);
> SUBSTITUTE ("mango") ("mango²") (0* ManecWords);
>
> Then in the bidix you would have:
>
> mango¹mango
> mango²mànec
>
>
Ok. Thanks.

In the first possibility the disambiguation would be used in every language
pair with Spanish. To keep the compatibility with the current dictionaries
I would do just:

SUBSTITUTE ("mango") ("mango¹") (0* MangoWords);

mangomànec  [English:
handle; French: manche; Portuguese: cabo...; but Italian has mango]
mango¹mango [This line
could be added optionally to other bilingual dictionaries]

Jaume
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] the 'mango' issue

2017-07-02 Thread Jaume Ortolà i Font
The Spanish word "mango" (n. masc.) has two main meanings:

1) handle. E.g. El mango de la sartén. / The handle of the frying pan.
2) mango (fruit or tree). E.g. He comido mango. / I have eaten mango.

To find the right translation is not difficult. "Handle" is the default
translation. But if some words are used in the sentence, then "mango"
should be the preferred translation.

I implemented a rule in a Catalan grammar checker to find wrong
translations of mango. With a list of about 100 lemmas, the results are
good enough. These lemmas come from a Catalan corpus (where the two
meanings are different words: mànec/mango).

How can this approach be best implemented in Apertium? I translated the
list of 100 words to Spanish and wrote some rules in the Constraint-based
lexical selection module.  But I need to write a different rule for each
distance from the ambiguous word.[1] Is there a way to indicate "any place
in the sentence"? If implemented, I think it could be very productive. The
same list of Spanish words can be used by other language pairs.

Jaume Ortolà

[1] https://apertium.projectjj.com/trac/changeset/80066
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] matching Spanish contractions in Constraint Grammar

2017-06-30 Thread Jaume Ortolà i Font
Hi,

I need  to match a Spanish contraction ("del" or "al") in Constraint
Grammar, but I am unable to do it.

The contraction is:
^del/de+el$

The only think that seems to be visible in the Constraint Grammar syntax is
the second element of the contraction:
el

It matches ("el") or (det def m sg), but not ("de"), ("del") or (pr).

Does anybody  know how to match the first element of the contraction (the
preposition)?

This could be solved differently. I think these contractions should be
tokenized earlier in the pipeline as two tokens. This way we would avoid a
lot of exceptions and workarounds when dealing with them. Is it feasible?
These contractions are extremely frequent and now they cause a lot of
undesired results.

Jaume Ortolà
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Introduction

2017-06-24 Thread Jaume Ortolà i Font
Hello,

My name is Jaume Ortolà. I have experience building Catalan dictionaries
and spell and grammar checkers.

Now I am using this experience to help fix and expand the Apertium
monolingual dictionaries in Catalan, Spanish, French and perhaps other
languages. I am working closely with Xavi Ivars.

I would like to get commit access to the SVN repository. My SourceForge id
is jaumeortola.

Thank you,
Jaume Ortolà
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bug in "apertium-viewer" (or, really, bug in lttoolbox-java)

2017-06-04 Thread Jaume Ortolà i Font
Hi, Jacob.

With your new version of apertium-viewer, I get an exception error when
starting the program.

I also get a compilation error with the source code. See below.

Regards,
Jaume Ortolà


Exception in thread "AWT-EventQueue-0" java.lang.NoSuchMethodError:
org.apertium.pipeline.Program.getParameterList()Ljava/util/List;
at apertiumview.sourceeditor.SourcecodeFinder.createHtmlLinkText(
SourcecodeFinder.java:94)
at apertiumview.TextWidget.setProgram(TextWidget.java:272)
at apertiumview.ApertiumView.setMode(ApertiumView.java:628)
at apertiumview.ApertiumView.updateSelectedMode(ApertiumView.java:515)
at apertiumview.ApertiumView.modesComboBoxActionPerformed(
ApertiumView.java:1264)
at apertiumview.ApertiumView.updateModesComboBox(ApertiumView.java:771)
at apertiumview.ApertiumView.loadModes(ApertiumView.java:452)
at apertiumview.ApertiumView.(ApertiumView.java:279)
at apertiumview.ApertiumViewMain.startup(ApertiumViewMain.java:28)
at apertiumview.ApertiumViewMain$2.run(ApertiumViewMain.java:78)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:756)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.
doIntersectionPrivilege(ProtectionDomain.java:80)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:726)
at java.awt.EventDispatchThread.pumpOneEventForFilters(
EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEventsForFilter(
EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(
EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)


-do-compile:
[javac] Compiling 5 source files to /home/jaume/apertium/apertium-
viewer/build/classes
[javac] warning: [options] bootstrap class path not set in conjunction
with -source 1.7
[javac] 
/home/jaume/apertium/apertium-viewer/src/apertiumview/Pipeline.java:140:
error: cannot find symbol
[javac] List args = new
ArrayList(program.getParameterList());
[javac]
 ^
[javac]   symbol:   method getParameterList()
[javac]   location: variable program of type Program
[javac] 
/home/jaume/apertium/apertium-viewer/src/apertiumview/Pipeline.java:141:
error: cannot find symbol
[javac] args = Dispatcher.replace("$1",
markUnknownWords ? "-g" : "-n", args);
[javac]  ^
[javac]   symbol:   method replace(String,String,List)
[javac]   location: class Dispatcher
[javac] 
/home/jaume/apertium/apertium-viewer/src/apertiumview/Pipeline.java:142:
error: cannot find symbol
[javac] args = Dispatcher.replace("$2", "", args); //
Don't display ambiguity
[javac]  ^
[javac]   symbol:   method replace(String,String,List)
[javac]   location: class Dispatcher
[javac] 
/home/jaume/apertium/apertium-viewer/src/apertiumview/Pipeline.java:143:
error: cannot find symbol
[javac] args = Dispatcher.replace("$3", "", args); //
What is this $3 ??!??
[javac]  ^
[javac]   symbol:   method replace(String,String,List)
[javac]   location: class Dispatcher
[javac] /home/jaume/apertium/apertium-viewer/src/apertiumview/
sourceeditor/SourcecodeFinder.java:38: error: cannot find symbol
[javac] for (String param : program.getParameterList()) {
[javac]   ^
[javac]   symbol:   method getParameterList()
[javac]   location: variable program of type Program
[javac] /home/jaume/apertium/apertium-viewer/src/apertiumview/
sourceeditor/SourcecodeFinder.java:94: error: cannot find symbol
[javac] for (String param : program.getParameterList()) {
[javac]   ^
[javac]   symbol:   method getParameterList()
[javac]   location: variable program of type Program
[javac] 6 errors
[javac] 1 warning


Salutacions,
Jaume Ortolà
www.riuraueditors.cat

2017-06-03 1:03 GMT+02:00 Jacob Nordfalk :

> Hi there!
>
> Thanks for reporting this!
>
> I have fixed the bugs so that quotes ( " and ' ) are allowed in file
> names, and paths with spaces should work as well, both in lttoolbox-java
> and in apertium-viewer.
>
> I just uploaded a new version of apertium-viewer to
> http://javabog.dk/filer/apertium/apertium-viewer.jar
>
> Evrything, including invocation of C++ version, should work now, please
> try it out with
> wget http://javabog.dk/filer/apertium/apertium-viewer.jar
> java -jar apertium-viewer.jar
>
> When you confirm this I make a new version for 'build' folder.
> (I hope someone els