Hi Rico, Thank you so much for your help, the deescape-special-chers.perl code did the job perfectly and removed all the sepcial xml chars. Now i have another question, i followed the moses manual and trained moses on the news commentary corpus and now i have the moses.ini file and before doing the tuning task i tried to test the trained system with a simple frensh sentence to transalte it to English, but to do that moses consumed all the memory i have which caused my laptop to stop responding (i have an Intel i7-4702MQ processor with 8GB RAM and enough space on disk). so can you please tell me what was the problem??? do i have to binarise the translation table ??? or is it normal for the system to consume that much memory???
Thanks again. ᐧ 2015-03-26 12:47 GMT+01:00 Rico Sennrich <rico.sennr...@gmx.ch>: > Abdelfetah Boumerdas <aa_boumerdas@...> writes: > > > > > > > > > > > Hi All, > > i'm trying to build a translation model using moses, and to do that i'm > using 2 corpora (europarl and the news commentary corpus provided in the > manual) but when i reached the corpus preparation step i noticed the > following problem: in the prepared europarl files i find that the > apostrophe > (') and the quotation marks are replaced respectively with (') and > (") but in the second corpus they're still unchanged. > > can anyone please tell me why?? is it a problem with the files encoding > (i > checked and they're both utf8)?? or is it another problem that i don't know > about??? > > Thanks in advance. > > -- > > > Hi Abdelfetah, > > some special characters (<, >, [, ], ", ', |) are reserved because they > have > special meaning in the phrase table and/or to support XML input. The > tokenizer.perl script automatically replaces them with escape sequences, > and > the detokenizer unescapes them again. There's also the scripts > (de)escape-special-chars.perl to go from one to the other without > (de)tokenizing. > > consistency (between corpora and between training and test time) is > important. Is it possible that you used different versions of the > tokenizer.perl script? Older versions did not do escaping. > > best wishes, > Rico > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > -- BOUMERDAS Abdelfetah 5ème Année Option Systèmes Informatiques (SIQ) Ecole nationale Supérieure d'Informatique ESI (ex INI) BP 68 M Oued Smar 16309 - ALGER
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support