Re: [Moses-support] Problem with corpus preparation

2015-03-28 Thread Rico Sennrich

On 28/03/15 13:26, Abdelfetah Boumerdas wrote:

Hi Rico,
Thank you so much for your help, the deescape-special-chers.perl code 
did the job perfectly and removed all the sepcial xml chars.
Now i have another question, i followed the moses manual and trained 
moses on the news commentary corpus and now i have the moses.ini file 
and before doing the tuning task i tried to test the trained system 
with a simple frensh sentence to transalte it to English, but to do 
that moses consumed all the memory i have which caused my laptop to 
stop responding (i have an Intel i7-4702MQ processor with 8GB RAM and 
enough space on disk). so can you please tell me what was the 
problem??? do i have to binarise the translation table ??? or is it 
normal for the system to consume that much memory???


Thanks again.
ᐧ

Hi Abdelfetah,

it's not uncommon for moses to use more than 8GB of RAM during decoding, 
depending on the size of your models. Here are some ways to reduce 
memory usage, but you might also want to consider using a computer with 
more memory: http://www.statmt.org/moses/?n=Moses.Optimize#ntoc19


best wishes,
Rico

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Problem with corpus preparation

2015-03-28 Thread Abdelfetah Boumerdas
Hi Rico,
Thank you so much for your help, the deescape-special-chers.perl code did
the job perfectly and removed all the sepcial xml chars.
Now i have another question, i followed the moses manual and trained moses
on the news commentary corpus and now i have the moses.ini file and before
doing the tuning task i tried to test the trained system with a simple
frensh sentence to transalte it to English, but to do that moses consumed
all the memory i have which caused my laptop to stop responding (i have an
Intel i7-4702MQ processor with 8GB RAM and enough space on disk). so can
you please tell me what was the problem??? do i have to binarise the
translation table ??? or is it normal for the system to consume that much
memory???

Thanks again.
ᐧ

2015-03-26 12:47 GMT+01:00 Rico Sennrich :

> Abdelfetah Boumerdas  writes:
>
> >
> >
> >
> >
> > Hi All,
> > i'm trying to build a translation model using moses, and to do that i'm
> using 2 corpora (europarl and the news commentary corpus provided in the
> manual) but when i reached the corpus preparation step i noticed the
> following problem: in the prepared europarl files i find that the
> apostrophe
> (') and the quotation marks are replaced respectively with (') and
> (") but in the second corpus they're still unchanged.
> > can anyone please tell me why?? is it a problem with the files encoding
> (i
> checked and they're both utf8)?? or is it another problem that i don't know
> about???
> > Thanks in advance.
> > --
>
>
> Hi Abdelfetah,
>
> some special characters (<, >, [, ], ", ', |) are reserved because they
> have
> special meaning in the phrase table and/or to support XML input. The
> tokenizer.perl script automatically replaces them with escape sequences,
> and
> the detokenizer unescapes them again. There's also the scripts
> (de)escape-special-chars.perl to go from one to the other without
> (de)tokenizing.
>
> consistency (between corpora and between training and test time) is
> important. Is it possible that you used different versions of the
> tokenizer.perl script? Older versions did not do escaping.
>
> best wishes,
> Rico
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
BOUMERDAS Abdelfetah
5ème Année Option Systèmes Informatiques (SIQ)
Ecole nationale Supérieure d'Informatique ESI (ex INI)
BP 68 M Oued Smar 16309 - ALGER
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Problem with corpus preparation

2015-03-26 Thread Rico Sennrich
Abdelfetah Boumerdas  writes:

> 
> 
> 
> 
> Hi All,
> i'm trying to build a translation model using moses, and to do that i'm
using 2 corpora (europarl and the news commentary corpus provided in the
manual) but when i reached the corpus preparation step i noticed the
following problem: in the prepared europarl files i find that the apostrophe
(') and the quotation marks are replaced respectively with (') and
(") but in the second corpus they're still unchanged.
> can anyone please tell me why?? is it a problem with the files encoding (i
checked and they're both utf8)?? or is it another problem that i don't know
about???
> Thanks in advance. 
> -- 


Hi Abdelfetah,

some special characters (<, >, [, ], ", ', |) are reserved because they have
special meaning in the phrase table and/or to support XML input. The
tokenizer.perl script automatically replaces them with escape sequences, and
the detokenizer unescapes them again. There's also the scripts
(de)escape-special-chars.perl to go from one to the other without
(de)tokenizing.

consistency (between corpora and between training and test time) is
important. Is it possible that you used different versions of the
tokenizer.perl script? Older versions did not do escaping.

best wishes,
Rico

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support