Marwen, yes. Tokenizer.perl escapes the five reserved XML characters
to avoid conflicts with XML-RPC support. It also escapes other reserved
characters used internally in Moses such as the vertical bar | character
and square brackets [ and ]. 

On Tue, 07 Aug 2012 11:42:57 +0200,
Marwen AZOUZI  wrote:  
Hi Moses,

 I'm not sure if your problem has any
relation with whether the apostrophe is escaped or not. If you open the
script "mosesdecoder/scripts/tokenizer/escape-special-chars.perl" line
19 :

 s/'/'/g; # xml

 This means that the apostrophe is escaped with
respect to XML. Here's my question: _Why does Moses __need to escape XML
special characters_? Is it for this reason
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc7 [1]?

 But
as mentioned Barry, it's most likely a UNICODE normalization problem. In
your training data, you may have ´s or 's instead of the ASCII 's.


Marwen

 On 08/07/2012 10:51 AM, Tan, Jun wrote:  

Hi Barry,

How to
check the Moses version? I'm sure that the tokeniser for training is
same with testing. I'm using Standford Word Segmenter for Chinese
language. 

-----Original Message-----
From: Barry Haddow
[mailto:bhad...@staffmail.ed.ac.uk [2]] 
Sent: Tuesday, August 07, 2012
4:43 PM
To: Tan, Jun
Cc: tah...@precisiontranslationtools.com [3];
moses-support@mit.edu [4]
Subject: Re: [Moses-support] how does Moses
handle with the apostrophes?

Hi Jun

Is the apostrophe in your source
data an ascii apostrophe, or a unicode variant (use xxd to check this)?
As Tom said, recent versions of the Moses tokeniser escape apostrophes,
so either you're using an old version, or it does not recognise it as an
apostrophe.

Make sure you are using the same tokeniser in training and
test.

cheers - Barry

On 07/08/12 06:38, jun....@emc.com [5]
wrote:

Yes, I'm using Moss's tokenizer.perl for English language, and
the Moses got installed in June, the version should be relatively
new.
Do you have any ideas how to fix it?
From: Tom Hoar
[mailto:tah...@precisiontranslationtools.com [6]]
Sent: Tuesday, August
07, 2012 1:13 PM
To: Tan, Jun
Cc: moses-support@mit.edu [7]
Subject: Re:
[Moses-support] how does Moses handle with the apostrophes?

If you're
using Moses' tokenizer.perl script, the English handling separates the
"company's" into "company 's". In recent (~2 months) moses github
releases, the tokenizer.perl script also escapes the string to this
"company's". The English detokenizer unescapes the "'s" to "'s" and
restores it without the preceding space.

On Tue, 7 Aug 2012 00:33:07
-0400, wrote:
Hi all,

When I using Moses to translate some sentences
contain apostrophes, it doesn't work correctly.
Source:
EMC Corporation
(NYSE:EMC) today reported strong financial results for the second
quarter of 2012, marking the company's 10th consecutive quarter of
double-digit year-over-year growth for consolidated revenue, GAAP net
income, and GAAP and non-GAAP EPS. EMC expects to achieve its full-year
2012 goals for consolidated revenue, non-GAAP EPS and free cash
flow.

Translation result:
2012 年 7 月 24 日 -- EMC 公司 ( NYSE : EMC) 今天 报告
了 强有力 的 财务 业绩 2012 年 第 2 
季度 , 标志 着 公司 's 连续 10 个 季度 实现 两 位 数 的 同比 增长 ,
以 实现 整合 的 收入 、 GAAP 净 收入 
和 GAAP 和 非 GAAP 每 股 收益 。 EMC 预计 到 2012 年 实现 其
目标 的 要求 年 全 年 的 合并 收入 、 
非 GAAP EPS 和 自由 现金流 。

As we can see, the
translation result of "company's" is "公司 's",and translation of the
apostrophes(') and the letter (s) got failed.
Does anybody know the
cause of this issue? Do I need some other module to handle it? Does
anybody know how to fix it? Below is an
example:

Thanks

_______________________________________________
Moses-support
mailing list
Moses-support@mit.edu
[10]
http://mailman.mit.edu/mailman/listinfo/moses-support [11]

--
The
University of Edinburgh is a charitable body, registered in Scotland,
with registration number
SC005336.

_______________________________________________
Moses-support
mailing list
Moses-support@mit.edu
[12]
http://mailman.mit.edu/mailman/listinfo/moses-support
[13]



Links:
------
[1]
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc7
[2]
mailto:bhad...@staffmail.ed.ac.uk
[3]
mailto:tah...@precisiontranslationtools.com
[4]
mailto:moses-support@mit.edu
[5] mailto:jun....@emc.com
[6]
mailto:tah...@precisiontranslationtools.com
[7]
mailto:moses-support@mit.edu
[8] mailto:jun....@emc.com
[9]
mailto:jun....@emc.com
[10] mailto:Moses-support@mit.edu
[11]
http://mailman.mit.edu/mailman/listinfo/moses-support
[12]
mailto:Moses-support@mit.edu
[13]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to