Please subscribe to the Moses mailing list before posting to it. You can
subscribe here:
   http://mailman.mit.edu/mailman/listinfo/moses-support

To answer your question - the source code line that it blows up on seems to
be just for debugging that you delete/comment out. Trying deleting line
   moses/server/TranslationRequest.cpp:473
and let me know if it works

Hieu Hoang
http://moses-smt.org/


---------- Forwarded message ----------
From: <moses-support-ow...@mit.edu>
Date: 12 October 2017 at 12:18
Subject: Moses-support post from qt.ngu...@gmx.de requires approval
To: moses-support-ow...@mit.edu


As list administrator, your authorization is requested for the
following mailing list posting:

    List:    Moses-support@mit.edu
    From:    qt.ngu...@gmx.de
    Subject: Moses-Server girerr::error on characters outside the BMP
    Reason:  Post by non-member to a members-only list

At your convenience, visit:

    http://mailman.mit.edu/mailman/admindb/moses-support

to approve or deny the request.


---------- Forwarded message ----------
From: Trung Nguyen <qt.ngu...@gmx.de>
To: moses-support@mit.edu
Cc:
Bcc:
Date: Thu, 12 Oct 2017 13:18:36 +0200
Subject: Moses-Server girerr::error on characters outside the BMP

I am running moses in server mode to translate from modern Vietnamese to
old Vietnamese characters. Many of these old characters are not in the
Basic Multilingual Plane of Unicode, for example the word "hai" corresponds
to the character "𠄩", which has the code point U+20129.

On the command line everything works fine. But in server mode characters
outside the BMP, i.e. code points above 0xFFFF, cause the server to
terminate.

I am using a simple python 3 script to query the moses server:

import xmlrpc.client
client = xmlrpc.client.ServerProxy('http://localhost:8012/RPC2')
result = client.translate({'text': 'hai'})
translation = result.get('text')

The error message I get:

Translating: hai
Line 0: Collecting options took 0.000 seconds at moses/Manager.cpp Line 141
Line 0: Search took 0.000 seconds
[moses/server/TranslationRequest.cpp:473] BEST TRANSLATION: 𠄩 [1]
[total=-2.740] core=(0.000,-1.000,1.000,0.000,0.000,0.000,0.000,-0.016,
0.000,0.000,0.000,0.000,0.000,0.000,-7.869)
terminate called after throwing an instance of 'girerr::error'
  what():  10-byte supposed UTF-8 string is not valid UTF-8.  UTF-8 string
contains a character not in the Basic Multilingual Plane (first byte
0xfffffff0)

Thank You

Trung Nguyen




---------- Forwarded message ----------
From: moses-support-requ...@mit.edu
To:
Cc:
Bcc:
Date:
Subject: confirm 37c0533a64482c99f31c2923d651dd0851292435
If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message.  Do this if the message is
spam.  If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list.  The Approved: header can also appear in the first line
of the body of the reply.
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to