Hey Венци,

Did you by any chance binarize your phrase tables from a raw text format or
from gunzip (or any other supported compressed text formats)? I recently
run into similar issues with my phrase table (ProbingPT)  if the input
phrase table had not been compressed during binary creation. I wasn't able
to trace the issue, i just make sure I gz any phrase table before



On Mon, Mar 30, 2015 at 10:11 AM, Marcin Junczys-Dowmunt <junc...@amu.edu.pl
> wrote:

>  Forgot to add that we use the compact phrase table and Moses on older
> and newer Ubuntu version with Arabic, Chinese, Korean, Japanese, Russian in
> both directions and no problems. Those puny German umlauts should not be a
> challenge. :)
> W dniu 30.03.2015 o 11:08, Marcin Junczys-Dowmunt pisze:
> Hi,
> the phrase-table and as far as I know Moses in general are
> unicode-agnostic, as long as you use utf-8. Input is handled as raw byte
> sequences, most of the time there are numeric identifiers only.
> Sounds more like a couple of messed up systems on your side, especially
> the part where self-compiled systems work or don't work. Cannot give you
> much more insight, unfortunately.
> Best,
> Marcin
> W dniu 30.03.2015 o 10:53, "Венцислав Жечев (Ventsislav Zhechev)" pisze:
> Hi all,
>  I’m having this really weird Unicode issue when using compact phrase
> tables that could be related to endianness somehow, but I’ve no idea how.
> I compiled the training tools from v3 on my Mac and built a few models
> using compact phrase (and reordering) tables and KenLM, including (for
> simplicity) a recasing model for DE (download it from
> https://autodesk.box.com/DE-Recaser). Things become strange when I try to
> use the models, though:
> 1. All works fine when I use the decoder binary I compiled myself on the
> Mac (10.10.2, self-built Boost 1.57)
>  2. Unicode input is not recognised when I use the binary from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/macosx-yosemite/ i.e.
> words like ‘für’ or ‘ausführlich’ are marked as UNK.
> 3. Unicode input is not recognised when I use a binary I compiled myself
> on Ubuntu 12.04.5 (self-built Boost 1.57)
> 4. All  works fine when I use the binary from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/
>  I tested the above with the queryPhraseTableMin tool (rather than the
> decoder) and got the same results, which is what makes me think this could
> be somehow related to binary incompatibility with the way the phrase table
> is compacted. Haven’t investigated deeper than that, though.
>  Any clues?
> One would say, just use the Linux binary then on Linux... However, I have
> a number of CentOS/RHEL 5 and 6 boxes, where the pre-compiled binary
> doesn’t work, as the system glibc is too old. So there I need to compile
> Moses myself, but then Unicode isn’t recognised...
>  Cheers,
>   Ventzi
>  –––––––
> *Dr. Ventsislav Zhechev*
> Computational Linguist, Certified ScrumMaster®
> Platform Architecture and Technologies
> Localisation Services
>  *MAIN* +41 32 723 91 22
> *FAX* +41 32 723 93 99
>  *http://VentsislavZhechev.eu <http://VentsislavZhechev.eu>*
>  *Autodesk, Inc.*
> Rue de Puits-Godet 6
> 2000 Neuchâtel, Switzerland
> *www.autodesk.com <http://www.autodesk.com/>*
> _______________________________________________
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
Moses-support mailing list

Reply via email to