Hi Rico,
Thank you so much for your help, the deescape-special-chers.perl code did
the job perfectly and removed all the sepcial xml chars.
Now i have another question, i followed the moses manual and trained moses
on the news commentary corpus and now i have the moses.ini file and before
doing the tuning task i tried to test the trained system with a simple
frensh sentence to transalte it to English, but to do that moses consumed
all the memory i have which caused my laptop to stop responding (i have an
Intel i7-4702MQ processor with 8GB RAM and enough space on disk). so can
you please tell me what was the problem??? do i have to binarise the
translation table ??? or is it normal for the system to consume that much
memory???

Thanks again.
ᐧ

2015-03-26 12:47 GMT+01:00 Rico Sennrich <rico.sennr...@gmx.ch>:

> Abdelfetah Boumerdas <aa_boumerdas@...> writes:
>
> >
> >
> >
> >
> > Hi All,
> > i'm trying to build a translation model using moses, and to do that i'm
> using 2 corpora (europarl and the news commentary corpus provided in the
> manual) but when i reached the corpus preparation step i noticed the
> following problem: in the prepared europarl files i find that the
> apostrophe
> (') and the quotation marks are replaced respectively with (&apos;) and
> (&quot;) but in the second corpus they're still unchanged.
> > can anyone please tell me why?? is it a problem with the files encoding
> (i
> checked and they're both utf8)?? or is it another problem that i don't know
> about???
> > Thanks in advance.
> > --
>
>
> Hi Abdelfetah,
>
> some special characters (<, >, [, ], ", ', |) are reserved because they
> have
> special meaning in the phrase table and/or to support XML input. The
> tokenizer.perl script automatically replaces them with escape sequences,
> and
> the detokenizer unescapes them again. There's also the scripts
> (de)escape-special-chars.perl to go from one to the other without
> (de)tokenizing.
>
> consistency (between corpora and between training and test time) is
> important. Is it possible that you used different versions of the
> tokenizer.perl script? Older versions did not do escaping.
>
> best wishes,
> Rico
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
BOUMERDAS Abdelfetah
5ème Année Option Systèmes Informatiques (SIQ)
Ecole nationale Supérieure d'Informatique ESI (ex INI)
BP 68 M Oued Smar 16309 - ALGER
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to