Marwen, yes. Tokenizer.perl escapes the five reserved XML characters to avoid conflicts with XML-RPC support. It also escapes other reserved characters used internally in Moses such as the vertical bar | character and square brackets [ and ].
On Tue, 07 Aug 2012 11:42:57 +0200, Marwen AZOUZI wrote: Hi Moses, I'm not sure if your problem has any relation with whether the apostrophe is escaped or not. If you open the script "mosesdecoder/scripts/tokenizer/escape-special-chars.perl" line 19 : s/'/'/g; # xml This means that the apostrophe is escaped with respect to XML. Here's my question: _Why does Moses __need to escape XML special characters_? Is it for this reason http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc7 [1]? But as mentioned Barry, it's most likely a UNICODE normalization problem. In your training data, you may have ´s or 's instead of the ASCII 's. Marwen On 08/07/2012 10:51 AM, Tan, Jun wrote: Hi Barry, How to check the Moses version? I'm sure that the tokeniser for training is same with testing. I'm using Standford Word Segmenter for Chinese language. -----Original Message----- From: Barry Haddow [mailto:bhad...@staffmail.ed.ac.uk [2]] Sent: Tuesday, August 07, 2012 4:43 PM To: Tan, Jun Cc: tah...@precisiontranslationtools.com [3]; moses-support@mit.edu [4] Subject: Re: [Moses-support] how does Moses handle with the apostrophes? Hi Jun Is the apostrophe in your source data an ascii apostrophe, or a unicode variant (use xxd to check this)? As Tom said, recent versions of the Moses tokeniser escape apostrophes, so either you're using an old version, or it does not recognise it as an apostrophe. Make sure you are using the same tokeniser in training and test. cheers - Barry On 07/08/12 06:38, jun....@emc.com [5] wrote: Yes, I'm using Moss's tokenizer.perl for English language, and the Moses got installed in June, the version should be relatively new. Do you have any ideas how to fix it? From: Tom Hoar [mailto:tah...@precisiontranslationtools.com [6]] Sent: Tuesday, August 07, 2012 1:13 PM To: Tan, Jun Cc: moses-support@mit.edu [7] Subject: Re: [Moses-support] how does Moses handle with the apostrophes? If you're using Moses' tokenizer.perl script, the English handling separates the "company's" into "company 's". In recent (~2 months) moses github releases, the tokenizer.perl script also escapes the string to this "company's". The English detokenizer unescapes the "'s" to "'s" and restores it without the preceding space. On Tue, 7 Aug 2012 00:33:07 -0400, wrote: Hi all, When I using Moses to translate some sentences contain apostrophes, it doesn't work correctly. Source: EMC Corporation (NYSE:EMC) today reported strong financial results for the second quarter of 2012, marking the company's 10th consecutive quarter of double-digit year-over-year growth for consolidated revenue, GAAP net income, and GAAP and non-GAAP EPS. EMC expects to achieve its full-year 2012 goals for consolidated revenue, non-GAAP EPS and free cash flow. Translation result: 2012 年 7 月 24 日 -- EMC 公司 ( NYSE : EMC) 今天 报告 了 强有力 的 财务 业绩 2012 年 第 2 季度 , 标志 着 公司 's 连续 10 个 季度 实现 两 位 数 的 同比 增长 , 以 实现 整合 的 收入 、 GAAP 净 收入 和 GAAP 和 非 GAAP 每 股 收益 。 EMC 预计 到 2012 年 实现 其 目标 的 要求 年 全 年 的 合并 收入 、 非 GAAP EPS 和 自由 现金流 。 As we can see, the translation result of "company's" is "公司 's",and translation of the apostrophes(') and the letter (s) got failed. Does anybody know the cause of this issue? Do I need some other module to handle it? Does anybody know how to fix it? Below is an example: Thanks _______________________________________________ Moses-support mailing list Moses-support@mit.edu [10] http://mailman.mit.edu/mailman/listinfo/moses-support [11] -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu [12] http://mailman.mit.edu/mailman/listinfo/moses-support [13] Links: ------ [1] http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc7 [2] mailto:bhad...@staffmail.ed.ac.uk [3] mailto:tah...@precisiontranslationtools.com [4] mailto:moses-support@mit.edu [5] mailto:jun....@emc.com [6] mailto:tah...@precisiontranslationtools.com [7] mailto:moses-support@mit.edu [8] mailto:jun....@emc.com [9] mailto:jun....@emc.com [10] mailto:Moses-support@mit.edu [11] http://mailman.mit.edu/mailman/listinfo/moses-support [12] mailto:Moses-support@mit.edu [13] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support