please subscribe to the mailing list before posting to it. You can subscribe here: http://mailman.mit.edu/mailman/listinfo/moses-support
I don't really understand your questions. All characters are taken into account by the decoder and the training algorithms. There are some reserved characters that you must not use - [ ] | < > & You might want to put something into a xml tag, eg. <private-xml ff="....">. I think then the decoder will ignore it On 18 March 2014 14:20, <moses-support-ow...@mit.edu> wrote: > As list administrator, your authorization is requested for the > following mailing list posting: > > List: Moses-support@mit.edu > From: arnaud.gicq...@linguacustodia.com > Subject: Issue about sentence segmentation > Reason: Post by non-member to a members-only list > > At your convenience, visit: > > http://mailman.mit.edu/mailman/admindb/moses-support > > to approve or deny the request. > > > ---------- Forwarded message ---------- > From: Arnaud Gicquel <arnaud.gicq...@linguacustodia.com> > To: moses-support@MIT.EDU > Cc: > Date: Tue, 18 Mar 2014 15:20:38 +0100 > Subject: Issue about sentence segmentation > > Hi all > > I am trying to develop a specific segmenter. The goal is to send Moses > decoder sentences instead of large textual flows syntactically incoherent. > In order to integrate this segmenter in an automatic workflow of documents > translation. I would define as a sentence delimiter any character that the > decoder does not take into account in its statistics. > > Is there a completely neutral character (I don't want it to be considered > unknown) that I could use as a delimiter? > > Thank you for your help > > Arnaud Gicquel > > -- > Lingua Custodia > 1, Place Charles de Gaulle > 78180 Montigny le Bretonneux - France > Tel : 33 1 30 44 04 23 > Email : arnaud.gicq...@linguacustodia.com > Website : www.linguacustodia.com > > > ---------- Forwarded message ---------- > From: moses-support-requ...@mit.edu > To: > Cc: > Date: > Subject: confirm d2337e56c58e533c4286cc73dbb52d9352c94e86 > If you reply to this message, keeping the Subject: header intact, > Mailman will discard the held message. Do this if the message is > spam. If you reply to this message and include an Approved: header > with the list password in it, the message will be approved for posting > to the list. The Approved: header can also appear in the first line > of the body of the reply. > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support