Hi Barry, I think the version is new, below is output from the file tokenizer.perl #escape special chars $text =~ s/\&/\&/g; # escape escape $text =~ s/\|/\|/g; # factor separator $text =~ s/\</\</g; # xml $text =~ s/\>/\>/g; # xml $text =~ s/\'/\'/g; # xml $text =~ s/\"/\"/g; # xml $text =~ s/\[/\[/g; # syntax non-terminal $text =~ s/\]/\]/g; # syntax non-terminal
-----Original Message----- From: Barry Haddow [mailto:bhad...@staffmail.ed.ac.uk] Sent: Tuesday, August 07, 2012 5:55 PM To: Tan, Jun Cc: moses-support@mit.edu Subject: Re: [Moses-support] how does Moses handle with the apostrophes? Hi Jun Recent versions of the tokeniser have a line like $text =~ s/\'/\'/g; # xml to escape apostrophes. cheers - Barry On 07/08/12 09:51, Tan, Jun wrote: > Hi Barry, > > How to check the Moses version? I'm sure that the tokeniser for training is > same with testing. I'm using Standford Word Segmenter for Chinese language. > > -----Original Message----- > From: Barry Haddow [mailto:bhad...@staffmail.ed.ac.uk] > Sent: Tuesday, August 07, 2012 4:43 PM > To: Tan, Jun > Cc: tah...@precisiontranslationtools.com; moses-support@mit.edu > Subject: Re: [Moses-support] how does Moses handle with the apostrophes? > > Hi Jun > > Is the apostrophe in your source data an ascii apostrophe, or a unicode > variant (use xxd to check this)? As Tom said, recent versions of the Moses > tokeniser escape apostrophes, so either you're using an old version, or it > does not recognise it as an apostrophe. > > Make sure you are using the same tokeniser in training and test. > > cheers - Barry > > On 07/08/12 06:38, jun....@emc.com wrote: >> Yes, I’m using Moss’s tokenizer.perl for English language, and the Moses got >> installed in June, the version should be relatively new. >> Do you have any ideas how to fix it? >> From: Tom Hoar [mailto:tah...@precisiontranslationtools.com] >> Sent: Tuesday, August 07, 2012 1:13 PM >> To: Tan, Jun >> Cc: moses-support@mit.edu >> Subject: Re: [Moses-support] how does Moses handle with the apostrophes? >> >> >> If you're using Moses' tokenizer.perl script, the English handling separates >> the "company's" into "company 's". In recent (~2 months) moses github >> releases, the tokenizer.perl script also escapes the string to this >> "company's". The English detokenizer unescapes the "'s" to "'s" >> and restores it without the preceding space. >> >> >> >> On Tue, 7 Aug 2012 00:33:07 -0400,<jun....@emc.com<mailto:jun....@emc.com>> >> wrote: >> Hi all, >> >> When I using Moses to translate some sentences contain apostrophes, it >> doesn’t work correctly. >> Source: >> EMC Corporation (NYSE:EMC) today reported strong financial results for the >> second quarter of 2012, marking the company's 10th consecutive quarter of >> double-digit year-over-year growth for consolidated revenue, GAAP net >> income, and GAAP and non-GAAP EPS. EMC expects to achieve its full-year 2012 >> goals for consolidated revenue, non-GAAP EPS and free cash flow. >> >> Translation result: >> 2012 年 7 月 24 日 — EMC 公司 ( NYSE : EMC) 今天 报告 了 强有力 的 财务 业绩 2012 年 第 2 >> 季度 , 标志 着 公司 's 连续 10 个 季度 实现 两 位 数 的 同比 增长 , 以 实现 整合 的 收入 、 GAAP 净 >> 收入 >> 和 GAAP 和 非 GAAP 每 股 收益 。 EMC 预计 到 2012 年 实现 其 目标 的 要求 年 全 年 的 合并 收入 、 >> 非 GAAP EPS 和 自由 现金流 。 >> >> As we can see, the translation result of “company's” is “公司 's”,and >> translation of the apostrophes(‘) and the letter (s) got failed. >> Does anybody know the cause of this issue? Do I need some other module to >> handle it? Does anybody know how to fix it? Below is an example: >> >> >> Thanks >> >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support > > -- > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support