Hi Mikel,

Thank you for your prompt reply. I am sorry that I didn't explain in detail
the first post what I was supposing to do.

I have already added some symbols into the dictionary and it works
perfectly!

Thank you so much.

Best,
Konstaninos
On 29 Jan 2015 10:31, "Mikel L. Forcada" <m...@dlsi.ua.es> wrote:

>  I just realized I hadn't copied the list.
>
> Mikel
>
> El 29/01/15 a les 09:30, Mikel L. Forcada ha escrit:
>
> Konstantinos:
>
> I get it now. I didn't make the connection between Hoang+Koehn's factored
> models and your "corpus factorization", apologies for that.
>
> Yes, for those character not declared in the dictionary's alphabet, the
> analyser just skips one character, to which it does not assign a lexical
> form, and then moves on. This may be the source of your problems. If the
> software generating vertical-bar-separated factors just looks at Apertium's
> delimited lexical forms between ^ and $, then whatever is outside these
> forms is lost.
>
> However, the English analyser I've just tested (from apertium-en-es,
> version 48232) does seem to assign lexical forms to euros and such, look:
>
> $ echo "This costs 100 EURO" | apertium -d . en-es-anmor
> ^This/This<det><dem><sg>/This<prn><tn><mf><sg>$
> ^costs/cost<n><pl>/cost<vblex><pri><p3><sg>$ ^100/100<num>$ 
> *^EURO/EURO<mon>$^*
> ./.<sent>$
>
> This is because of the entries like:
>
>      <e>
>       <re>[£$EURO][0-9]+([. ,][0-9][0-9][0-9])*([.,][0-9]+)?m</re>
>       <p>
>         <l></l>
>         <r><s n="num"/><s n="mon"/></r>
>       </p>
>     </e>
>
> or
>
>  <e><re>[$EURO £]</re><p><l></l><r><s n="mon"/></r></p></e>
>
> in apertium-en-es.en.metadix (the English morphological dictionary)
>
> But it misses symbols such as &:
>
> $ echo "Nonstandard & rich" | apertium -d . en-es-anmor
> ^Nonstandard/*Nonstandard$ *&* ^rich/rich<adj><sint>$^./.<sent>$
>
> There are a few ways out of this. One would be to add these marks to the
> dictionary and recompiling it. For instance, adding "&" as a form of "and",
> etc. Another one is to try to catch them as stray material between lexical
> forms and give it some tag.
>
> Please come back if some of this does not make sense.
>
> Cheers
>
> Mikel
>
>
>
>
>  El 28/01/15 a les 22:53, CHATZITHEODOROU Konstantinos ha escrit:
>
> Hi Mikel,
>
>  Thank you for your reply. I would like to train a factored model using
> Moses and followed the instructions from
> http://wiki.apertium.org/wiki/Preparing_data_for_Moses_factored_training_using_Apertium
> .
>
>  The issue regards mostly the symbols. For instance, the output for the
> sentence "This costs 100 EURO." is "This|this|det|morpho costs|cost|pos|morpho
> 100|100|num|num .|.|sent|sent". In that case, the  EURO is missing from the
> output.
>
>  Some other symbols that are missing after apertium are: ",/,*,& etc.
>
>  Thank you very much!
>
>  Cheers,
> Konstantinos
>
>
>
>
>
>
> 2015-01-28 21:55 GMT+02:00 Mikel Forcada <m...@dlsi.ua.es>:
>
>> Hi, Konstantinos.
>> I am not sure what you mean by "factorize". Also, what do you mean by
>> "missing symbols"? An example would help us help you!
>>
>> Mikel Forcada
>>
>> EL Wed, 28 Jan 2015
>> 20:53:58 +0200 CHATZITHEODOROU Konstantinos <dinosa...@gmail.com>
>> escrigué:
>>
>> > hi,
>> >
>> > I have used Apertium's resources to factorise the corpora for
>> > training a Moses model but I realised that there are missing lots of
>> > symbols. Is that correct? How can I get these symbols?
>> >
>> > Best,
>> > Konstantinos
>>
>>
>
>
>  --
> CHATZITHEODOROU Konstantinos
>
>
>
> --
>  Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03071 Alacant, Spain
> Phone: +34 96 590 9776
> Fax: +34 96 590 9326
>
>
>
> --
>  Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03071 Alacant, Spain
> Phone: +34 96 590 9776
> Fax: +34 96 590 9326
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to