I guess this refers to the "Pluggable preprocessing and OpenNLP" thread. I will be working on this stuff probably next week.
Regards, Tommaso Il giorno lun 23 gen 2017 alle ore 18:22 Joern Kottmann <kottm...@gmail.com> ha scritto: > Hello, > > no, I am on the dev list. > > Yes, the moses files are one sentences per line, this is very similar > to the OpenNLP default format which is also one sentence per line. But > the OpenNLP format requires document boundaries marked with new lines > which the moses files don't have. Also it seems they might need a bit > more pre-processing. Not every line end on a end of sentence character, > which is an issue for our evaluators. > > The letsmt work will be released today with OpenNLP 1.7.1 and moses > will be included in the release afterwards. As soon as we have this > merged it would be good if you could maybe give it a try and provide us > with feedback. > > Jörn > > > On Thu, 2017-01-19 at 09:12 -0500, Matt Post wrote: > > > On Jan 17, 2017, at 11:55 AM, Karel Novotný <ka...@apc.org> wrote: > > > > > > Hello Matt, > > > > > > Thanks for responding... > > > > > > On 17.1.2017 17:31, Matt Post wrote: > > > > Hello, > > > > > > > > Joshua would be suitable to this. We have models built for FR→EN > > > > and ES→EN. I want to improve these because some certain data was > > > > left out. I could also build ones for the other direction. > > > > > > That's excellent news. Can you please tell me a bit more about what > > > you > > > mean by having models for FR→EN and ES→EN ? Does this mean that the > > > tool > > > is ready to be used by other applications (e.g. mailman) to auto- > > > translate? > > > > > > Have you had any previous experience with similar implementation as > > > I > > > described? > > > > This just means we have pre-built models (which we call "language > > packs") that you can just download and immediately use to translate > > from French to English and from Spanish to English. For the complete > > list of language packs, along with instructions for how to use it, > > see this page: > > > > https://cwiki.apache.org/confluence/display/JOSHUA/Language+Pac > > ks > > > > You can just download any of these, unpack them, and start > > translating. The quality will vary, but for these two languages > > should be reasonable. > > > > To translate, the data you send to Joshua has to have already been > > sentence-split, because Joshua expects to receive input one sentence > > at a time. Joshua provides an API that you can make use of. Do you > > have any kind of expectations about your volume requirements? How > > many sentences will you be translating per day? > > > > matt > > > > > > > > > > > > One question — What do you mean about 3rd party services being > > > > "untrustworthy"? > > > > > > We wish to auto-translate lists with private conversations, so we > > > can > > > not run those by systems where we don't know (don't have control > > > of) > > > what happens with the data. That's all, I didn't want to accuse > > > anyone. > > > > Oh, that makes perfect sense. For some reason I assumed you were > > translating public mailing lists, but if you're doing private ones, > > it is reasonable to want to keep the data entirely in-house. > > > > > > > thanks > > > > > > karel > > > > > > > > > > > matt > > > > > > > > > > > > > On Jan 16, 2017, at 12:27 PM, Karel Novotný <ka...@apc.org> > > > > > wrote: > > > > > > > > > > Hello developers, > > > > > > > > > > I am new to this list, so missing a lot of background. > > > > > Apologies > > > > > beforehand for eventually dumb questions... > > > > > > > > > > We would like to build a self-hosted machine translation system > > > > > that > > > > > could be plugged into our mailman installs. The objective is > > > > > that the > > > > > members of our multicultural network would be able to send > > > > > email in > > > > > their mother language and it would be delivered to the list > > > > > machine-translated (and vise versa). The translation pairs we > > > > > care about > > > > > most are EN<->FR and EN<->ES > > > > > > > > > > Our dream scenario is: > > > > > > > > > > 1. A translator machine is installed on our server, so the > > > > > messages > > > > > don't need to be run through untrustworthy 3rd party services > > > > > (googletrans) > > > > > 2. Mailman (or similar) is connected to such a translator > > > > > 3. Mailing list users can opt to receive messages sent to the > > > > > mailing > > > > > list in following format: > > > > > > > > > > ---- > > > > > Message body > > > > > ---------------------- > > > > > Message body translated > > > > > ----- > > > > > > > > > > 4. Similarly, the system can be configured so that when > > > > > receiving > > > > > messages from specific senders the messages get translated from > > > > > FR or ES > > > > > into EN > > > > > > > > > > Our default language used on lists is EN > > > > > > > > > > Is Joshua relevant for this? Any previous experience with > > > > > similar setup? > > > > > I suppose that a lot of configuration would be needed, but at > > > > > this point > > > > > I want to know if I am not completely mistaken when considering > > > > > your > > > > > Joshua for this. > > > > > > > > > > Thanks > > > > > > > > > > karel > > > > > > > > > > ----------------------- > > > > > > > > > > -- > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > Karel Novotny > > > > > Knowledge Sharing & Network Development Coordinator > > > > > APC - The Association for Progressive Communications > > > > > https://www.apc.org > > > > > GSM: +420 605 243 246 <+420%20605%20243%20246> (GMT +1) > > > > > jabber: ka...@riseup.net > > > > > Working/online: Monday - Thursday > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&se > > > > > arch=0x7FDEF502377E4FCA > > > > > > > > > > > > > > > > -- > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > Karel Novotny > > > Knowledge Sharing & Network Development Coordinator > > > APC - The Association for Progressive Communications > > > https://www.apc.org <https://www.apc.org/> > > > GSM: +420 605 243 246 <+420%20605%20243%20246> (GMT +1) > > > jabber: ka...@riseup.net > > > Working/online: Monday - Thursday > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search > > > =0x7FDEF502377E4FCA > > > <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA> >