.
Cheers,
Ventzi
> 30.03.2015 г., в 11:08, moses-support-requ...@mit.edu написал(а):
>
> Date: Mon, 30 Mar 2015 11:08:13 +0200
> From: Marcin Junczys-Dowmunt
> Subject: Re: [Moses-support] Unicode Issues when Using Compact Phrase
> Table, Binaries vs. Own Build
> To:
Sounds like a case of composed characters.
Try passing the input through this:
uconv -f utf8 -t utf8 -x Any-NFKC --callback skip --remove-signature
On 03/30/2015 04:53 AM, "Венцислав Жечев (Ventsislav Zhechev)" wrote:
> Hi all,
>
> I’m having this really weird Unicode issue when using compact p
Hey Венци,
Did you by any chance binarize your phrase tables from a raw text format or
from gunzip (or any other supported compressed text formats)? I recently
run into similar issues with my phrase table (ProbingPT) if the input
phrase table had not been compressed during binary creation. I wasn
Forgot to add that we use the compact phrase table and Moses on older
and newer Ubuntu version with Arabic, Chinese, Korean, Japanese, Russian
in both directions and no problems. Those puny German umlauts should not
be a challenge. :)
W dniu 30.03.2015 o 11:08, Marcin Junczys-Dowmunt pisze:
H
Hi,
the phrase-table and as far as I know Moses in general are
unicode-agnostic, as long as you use utf-8. Input is handled as raw byte
sequences, most of the time there are numeric identifiers only.
Sounds more like a couple of messed up systems on your side, especially
the part where self-com