Hello,

no, I am on the dev list.

Yes, the moses files are one sentences per line, this is very similar
to the OpenNLP default format which is also one sentence per line. But
the OpenNLP format requires document boundaries marked with new lines
which the moses files don't have. Also it seems they might need a bit
more pre-processing. Not every line end on a end of sentence character,
which is an issue for our evaluators. 

The letsmt work will be released today with OpenNLP 1.7.1 and moses
will be included in the release afterwards. As soon as we have this
merged it would be good if you could maybe give it a try and provide us
with feedback.

Jörn 


On Thu, 2017-01-19 at 09:12 -0500, Matt Post wrote:
> > On Jan 17, 2017, at 11:55 AM, Karel Novotný <ka...@apc.org> wrote:
> > 
> > Hello Matt,
> > 
> > Thanks for responding...
> > 
> > On 17.1.2017 17:31, Matt Post wrote:
> > > Hello,
> > > 
> > > Joshua would be suitable to this. We have models built for FR→EN
> > > and ES→EN. I want to improve these because some certain data was
> > > left out. I could also build ones for the other direction.
> > 
> > That's excellent news. Can you please tell me a bit more about what
> > you
> > mean by having models for FR→EN and ES→EN ? Does this mean that the
> > tool
> > is ready to be used by other applications (e.g. mailman) to auto-
> > translate?
> > 
> > Have you had any previous experience with similar implementation as
> > I
> > described?
> 
> This just means we have pre-built models (which we call "language
> packs") that you can just download and immediately use to translate
> from French to English and from Spanish to English. For the complete
> list of language packs, along with instructions for how to use it,
> see this page:
> 
>       https://cwiki.apache.org/confluence/display/JOSHUA/Language+Pac
> ks
> 
> You can just download any of these, unpack them, and start
> translating. The quality will vary, but for these two languages
> should be reasonable.
> 
> To translate, the data you send to Joshua has to have already been
> sentence-split, because Joshua expects to receive input one sentence
> at a time. Joshua provides an API that you can make use of. Do you
> have any kind of expectations about your volume requirements? How
> many sentences will you be translating per day?
> 
> matt
> 
> 
> > > 
> > > One question — What do you mean about 3rd party services being
> > > "untrustworthy"?
> > 
> > We wish to auto-translate lists with private conversations, so we
> > can
> > not run those by systems where we don't know (don't have control
> > of)
> > what happens with the data. That's all, I didn't want to accuse
> > anyone.
> 
> Oh, that makes perfect sense. For some reason I assumed you were
> translating public mailing lists, but if you're doing private ones,
> it is reasonable to want to keep the data entirely in-house.
> 
> 
> > thanks
> > 
> > karel
> > 
> > > 
> > > matt
> > > 
> > > 
> > > > On Jan 16, 2017, at 12:27 PM, Karel Novotný <ka...@apc.org>
> > > > wrote:
> > > > 
> > > > Hello developers,
> > > > 
> > > > I am new to this list, so missing a lot of background.
> > > > Apologies
> > > > beforehand for eventually dumb questions...
> > > > 
> > > > We would like to build a self-hosted machine translation system
> > > > that
> > > > could be plugged into our mailman installs. The objective is
> > > > that the
> > > > members of our multicultural network would be able to send
> > > > email in
> > > > their mother language and it would be delivered to the list
> > > > machine-translated (and vise versa). The translation pairs we
> > > > care about
> > > > most are EN<->FR and EN<->ES
> > > > 
> > > > Our dream scenario is:
> > > > 
> > > > 1. A translator machine is installed on our server, so the
> > > > messages
> > > > don't need to be run through untrustworthy 3rd party services
> > > > (googletrans)
> > > > 2. Mailman (or similar) is connected to such a translator
> > > > 3. Mailing list users can opt to receive messages sent to the
> > > > mailing
> > > > list in following format:
> > > > 
> > > > ----
> > > > Message body
> > > > ----------------------
> > > > Message body translated
> > > > -----
> > > > 
> > > > 4. Similarly, the system can be configured so that when
> > > > receiving
> > > > messages from specific senders the messages get translated from
> > > > FR or ES
> > > > into EN
> > > > 
> > > > Our default language used on lists is EN
> > > > 
> > > > Is Joshua relevant for this? Any previous experience with
> > > > similar setup?
> > > > I suppose that a lot of configuration would be needed, but at
> > > > this point
> > > > I want to know if I am not completely mistaken when considering
> > > > your
> > > > Joshua for this.
> > > > 
> > > > Thanks
> > > > 
> > > > karel
> > > > 
> > > > -----------------------
> > > > 
> > > > -- 
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > Karel Novotny 
> > > > Knowledge Sharing & Network Development Coordinator
> > > > APC - The Association for Progressive Communications 
> > > > https://www.apc.org
> > > > GSM: +420 605 243 246 (GMT +1)
> > > > jabber: ka...@riseup.net
> > > > Working/online: Monday - Thursday
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&se
> > > > arch=0x7FDEF502377E4FCA
> > > > 
> > > > 
> > 
> > -- 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Karel Novotny 
> > Knowledge Sharing & Network Development Coordinator
> > APC - The Association for Progressive Communications 
> > https://www.apc.org <https://www.apc.org/>
> > GSM: +420 605 243 246 (GMT +1)
> > jabber: ka...@riseup.net
> > Working/online: Monday - Thursday
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search
> > =0x7FDEF502377E4FCA
> > <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA>

Reply via email to