I guess this refers to the "Pluggable preprocessing and OpenNLP" thread.
I will be working on this stuff probably next week.

Regards,
Tommaso

Il giorno lun 23 gen 2017 alle ore 18:22 Joern Kottmann <kottm...@gmail.com>
ha scritto:

> Hello,
>
> no, I am on the dev list.
>
> Yes, the moses files are one sentences per line, this is very similar
> to the OpenNLP default format which is also one sentence per line. But
> the OpenNLP format requires document boundaries marked with new lines
> which the moses files don't have. Also it seems they might need a bit
> more pre-processing. Not every line end on a end of sentence character,
> which is an issue for our evaluators.
>
> The letsmt work will be released today with OpenNLP 1.7.1 and moses
> will be included in the release afterwards. As soon as we have this
> merged it would be good if you could maybe give it a try and provide us
> with feedback.
>
> Jörn
>
>
> On Thu, 2017-01-19 at 09:12 -0500, Matt Post wrote:
> > > On Jan 17, 2017, at 11:55 AM, Karel Novotný <ka...@apc.org> wrote:
> > >
> > > Hello Matt,
> > >
> > > Thanks for responding...
> > >
> > > On 17.1.2017 17:31, Matt Post wrote:
> > > > Hello,
> > > >
> > > > Joshua would be suitable to this. We have models built for FR→EN
> > > > and ES→EN. I want to improve these because some certain data was
> > > > left out. I could also build ones for the other direction.
> > >
> > > That's excellent news. Can you please tell me a bit more about what
> > > you
> > > mean by having models for FR→EN and ES→EN ? Does this mean that the
> > > tool
> > > is ready to be used by other applications (e.g. mailman) to auto-
> > > translate?
> > >
> > > Have you had any previous experience with similar implementation as
> > > I
> > > described?
> >
> > This just means we have pre-built models (which we call "language
> > packs") that you can just download and immediately use to translate
> > from French to English and from Spanish to English. For the complete
> > list of language packs, along with instructions for how to use it,
> > see this page:
> >
> >       https://cwiki.apache.org/confluence/display/JOSHUA/Language+Pac
> > ks
> >
> > You can just download any of these, unpack them, and start
> > translating. The quality will vary, but for these two languages
> > should be reasonable.
> >
> > To translate, the data you send to Joshua has to have already been
> > sentence-split, because Joshua expects to receive input one sentence
> > at a time. Joshua provides an API that you can make use of. Do you
> > have any kind of expectations about your volume requirements? How
> > many sentences will you be translating per day?
> >
> > matt
> >
> >
> > > >
> > > > One question — What do you mean about 3rd party services being
> > > > "untrustworthy"?
> > >
> > > We wish to auto-translate lists with private conversations, so we
> > > can
> > > not run those by systems where we don't know (don't have control
> > > of)
> > > what happens with the data. That's all, I didn't want to accuse
> > > anyone.
> >
> > Oh, that makes perfect sense. For some reason I assumed you were
> > translating public mailing lists, but if you're doing private ones,
> > it is reasonable to want to keep the data entirely in-house.
> >
> >
> > > thanks
> > >
> > > karel
> > >
> > > >
> > > > matt
> > > >
> > > >
> > > > > On Jan 16, 2017, at 12:27 PM, Karel Novotný <ka...@apc.org>
> > > > > wrote:
> > > > >
> > > > > Hello developers,
> > > > >
> > > > > I am new to this list, so missing a lot of background.
> > > > > Apologies
> > > > > beforehand for eventually dumb questions...
> > > > >
> > > > > We would like to build a self-hosted machine translation system
> > > > > that
> > > > > could be plugged into our mailman installs. The objective is
> > > > > that the
> > > > > members of our multicultural network would be able to send
> > > > > email in
> > > > > their mother language and it would be delivered to the list
> > > > > machine-translated (and vise versa). The translation pairs we
> > > > > care about
> > > > > most are EN<->FR and EN<->ES
> > > > >
> > > > > Our dream scenario is:
> > > > >
> > > > > 1. A translator machine is installed on our server, so the
> > > > > messages
> > > > > don't need to be run through untrustworthy 3rd party services
> > > > > (googletrans)
> > > > > 2. Mailman (or similar) is connected to such a translator
> > > > > 3. Mailing list users can opt to receive messages sent to the
> > > > > mailing
> > > > > list in following format:
> > > > >
> > > > > ----
> > > > > Message body
> > > > > ----------------------
> > > > > Message body translated
> > > > > -----
> > > > >
> > > > > 4. Similarly, the system can be configured so that when
> > > > > receiving
> > > > > messages from specific senders the messages get translated from
> > > > > FR or ES
> > > > > into EN
> > > > >
> > > > > Our default language used on lists is EN
> > > > >
> > > > > Is Joshua relevant for this? Any previous experience with
> > > > > similar setup?
> > > > > I suppose that a lot of configuration would be needed, but at
> > > > > this point
> > > > > I want to know if I am not completely mistaken when considering
> > > > > your
> > > > > Joshua for this.
> > > > >
> > > > > Thanks
> > > > >
> > > > > karel
> > > > >
> > > > > -----------------------
> > > > >
> > > > > --
> > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > Karel Novotny
> > > > > Knowledge Sharing & Network Development Coordinator
> > > > > APC - The Association for Progressive Communications
> > > > > https://www.apc.org
> > > > > GSM: +420 605 243 246 <+420%20605%20243%20246> (GMT +1)
> > > > > jabber: ka...@riseup.net
> > > > > Working/online: Monday - Thursday
> > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&se
> > > > > arch=0x7FDEF502377E4FCA
> > > > >
> > > > >
> > >
> > > --
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > Karel Novotny
> > > Knowledge Sharing & Network Development Coordinator
> > > APC - The Association for Progressive Communications
> > > https://www.apc.org <https://www.apc.org/>
> > > GSM: +420 605 243 246 <+420%20605%20243%20246> (GMT +1)
> > > jabber: ka...@riseup.net
> > > Working/online: Monday - Thursday
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search
> > > =0x7FDEF502377E4FCA
> > > <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA>
>

Reply via email to