Hi Jun

Recent versions of the tokeniser have a line like

$text =~ s/\'/\'/g;  # xml

to escape apostrophes.

cheers - Barry

On 07/08/12 09:51, Tan, Jun wrote:
> Hi Barry,
>
> How to check the Moses version?  I'm sure that the tokeniser for training is 
> same with testing. I'm using Standford Word Segmenter for Chinese language.
>
> -----Original Message-----
> From: Barry Haddow [mailto:bhad...@staffmail.ed.ac.uk]
> Sent: Tuesday, August 07, 2012 4:43 PM
> To: Tan, Jun
> Cc: tah...@precisiontranslationtools.com; moses-support@mit.edu
> Subject: Re: [Moses-support] how does Moses handle with the apostrophes?
>
> Hi Jun
>
> Is the apostrophe in your source data an ascii apostrophe, or a unicode 
> variant (use xxd to check this)? As Tom said, recent versions of the Moses 
> tokeniser escape apostrophes, so either you're using an old version, or it 
> does not recognise it as an apostrophe.
>
> Make sure you are using the same tokeniser in training and test.
>
> cheers - Barry
>
> On 07/08/12 06:38, jun....@emc.com wrote:
>> Yes, I’m using Moss’s tokenizer.perl for English language, and the Moses got 
>> installed in June, the version should be relatively new.
>> Do you have any ideas how to fix it?
>> From: Tom Hoar [mailto:tah...@precisiontranslationtools.com]
>> Sent: Tuesday, August 07, 2012 1:13 PM
>> To: Tan, Jun
>> Cc: moses-support@mit.edu
>> Subject: Re: [Moses-support] how does Moses handle with the apostrophes?
>>
>>
>> If you're using Moses' tokenizer.perl script, the English handling separates 
>> the "company's" into "company 's". In recent (~2 months) moses github 
>> releases, the tokenizer.perl script also escapes the string to this 
>> "company's". The English detokenizer unescapes the "'s" to "'s" 
>> and restores it without the preceding space.
>>
>>
>>
>> On Tue, 7 Aug 2012 00:33:07 -0400,<jun....@emc.com<mailto:jun....@emc.com>>  
>>  wrote:
>> Hi all,
>>
>> When I using Moses to translate some sentences contain apostrophes, it 
>> doesn’t work correctly.
>> Source:
>> EMC Corporation (NYSE:EMC) today reported strong financial results for the 
>> second quarter of 2012, marking the company's 10th consecutive quarter of 
>> double-digit year-over-year growth for consolidated revenue, GAAP net 
>> income, and GAAP and non-GAAP EPS. EMC expects to achieve its full-year 2012 
>> goals for consolidated revenue, non-GAAP EPS and free cash flow.
>>
>> Translation result:
>> 2012 年 7 月 24 日 — EMC 公司 ( NYSE : EMC) 今天 报告 了 强有力 的 财务 业绩 2012 年 第 2
>> 季度 , 标志 着 公司 's 连续 10 个 季度 实现 两 位 数 的 同比 增长 , 以 实现 整合 的 收入 、 GAAP 净 收入
>> 和 GAAP 和 非 GAAP 每 股 收益 。 EMC 预计 到 2012 年 实现 其 目标 的 要求 年 全 年 的 合并 收入 、
>> 非 GAAP EPS 和 自由 现金流 。
>>
>> As we can see, the translation result of “company's” is “公司 's”,and 
>> translation of the apostrophes(‘)  and the letter (s) got failed.
>> Does anybody know the cause of this issue? Do I need some other module to 
>> handle it? Does anybody know how to fix it?  Below is an example:
>>
>>
>> Thanks
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336.
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to