Dears,

I found the problem

At the line number 289 in the tokenizer.perl script just add a space like
that

The original code

$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 ' $2/g;

The modified one 

$text =~ s/([\p{IsAlpha}])[']([\p{IsAlpha}])/$1 '  $2/g;

By this modification tokenization of files will be the same as tokenizing
one segment

Thanks

 

From: Ihab Ramadan [mailto:i.rama...@saudisoft.com] 
Sent: Wednesday, January 14, 2015 11:14 AM
To: moses-support@mit.edu
Subject: RE: Tokenization problem

 

Dears,

I still have this problem, for not confusing the decoder I used the
“–no-escape” parameter in the tokenizer.perl script but still have the
problem of adding extra space after quotations for tokenizing files however
in tokenizing a segment it comes without the extra space

For example

In the file 

“which will guide you through connecting and configuring your printer's
wireless connection. “ à “which will guide you through connecting and
configuring your printer ' s wireless connection .”

As a segment

“which will guide you through connecting and configuring your printer's
wireless connection. “ à “which will guide you through connecting and
configuring your printer 's wireless connection .”

I wonder if it is the same script why it generated two different outputs 

I have no experience in perl so I could not get the line of code which
differ between if the segment in a file or just one segment passed as a
parameter to the script

Please help

 

 

 

From: Ihab Ramadan [mailto:i.rama...@saudisoft.com] 
Sent: Monday, January 5, 2015 10:09 AM
To: moses-support@mit.edu
Subject: Tokenization problem

 

Dears,

Using the tokenizer on the training files replaces the apostrophes with
“' s” (with space) but if I use the same script to tokenize a sentence
it makes the apostrophes to be “'s” (without a space)

This problem confuse the decoder while translation 

How to solve this peoblem

Thanks  

 

Best Regards

Ihab Ramadan| Senior Developer|  <http://www.saudisoft.com/> Saudisoft -
Egypt | Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=V
SRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Apri
mary> linked |
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bo
okmark> ZA102637861 |  <https://twitter.com/Saudisoft> ZA102637858

 

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to