>> There is now a Dockerized Joshua that makes it easy to use KenLM, which >> reduces these requirements quite a bit. > > Ok. I will talk to our sysadmin and see if he can do this. I myself > don't know what 'dockerized' means in this context. If it is a separate > pack/module, can you please point me to it (we are interested in > En<->Fr, En<->Es, and Fr<->Es combinations)?
Docker is a virtual environment tool that makes it easy to share executable code. Here, it facilitates compiling KenLM, which some people have trouble with. We haven't built any models out of English, or any models that don't include English, unfortunately. >> We could also just build smaller models. If nothing else, to start out with, >> and then improve on later. >> >> >>> Our sysadmin should be able to make use of the API you mentioned. >>> >>> If all sentences must be sent separate.... Then I suppose that there is >>> no way that we would automatically re-compose any formatting >>> (paragraphs), right? Having translated text in one big block or as >>> separate phrases on separate lines might make translating of messages a >>> bit challenging. >> I would think this could be recomposed rather easily, but yes, it would take >> some bookkeeping. What we really want is a tool that could wrap Joshua and >> manage this for us — take a document, extract the sentences, get the >> translations (as generic annotations, perhaps), substitute them back in, and >> then return the document. Doesn't Tika do this, to an an extent? > > I don't know :-) But maybe someone else on this list has experience > with this. > > Thanks Matt. > > karel > >> >> >>> As for the volume.... While this is difficult to estimate, I've made a >>> calculation based on monthly volume in list archives in the absolute >>> peak month. The average per day is approx 1000 sentences, so it might be >>> around 3000 in peak days. >> This is nothing — minutes of computing, at best, and there are knobs you can >> turn to change this. >> >> >>> thanks for your interest in this. >>> >>> karel >>> >>>> matt >>>> >>>> >>>>> On Jan 17, 2017, at 12:07 PM, lewis john mcgibbney <lewi...@apache.org >>>>> <mailto:lewi...@apache.org><mailto:lewi...@apache.org >>>>> <mailto:lewi...@apache.org>>> wrote: >>>>> >>>>> Hi Karel, >>>>> The short answer is yes. >>>>> I would advise you to start at the Tutorial >>>>> https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started >>>>> <https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started><https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started >>>>> <https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started>> >>>>> If you find anything which causes you problems then please write back >>>>> here. >>>>> Once you have skipped through the tutorial then you will have a much >>>>> better >>>>> feel for the workflow required. >>>>> I can see the Apache Tika language identification and translate API's >>>>> being >>>>> of particular use here when considered in a runtime context. We have a >>>>> Joshua implementation over in Tika which can aid you in this task however >>>>> try the Joshua tutorial first. >>>>> Lewis >>>>> >>>>> On Mon, Jan 16, 2017 at 7:41 AM, Chris Mattmann <mattm...@apache.org >>>>> <mailto:mattm...@apache.org>> wrote: >>>>> >>>>>> Hi Karel, >>>>>> >>>>>> I would recommend moving this thread to dev@joshua.incubator.apache.org >>>>>> <mailto:dev@joshua.incubator.apache.org> >>>>>> instead of the private list. I’ve moved private to BCC. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> Cheers, >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>> On 1/16/17, 6:58 AM, wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> We would like to build a self-hosted machine translation system that >>>>>> could be plugged into our mailman installs. The objective is that the >>>>>> members of our multicultural network would be able to send email in >>>>>> their mother language and it would be delivered to the list >>>>>> machine-translated (and vise versa). >>>>>> >>>>>> Are we on the right track with Joshua? I suppose that a lot of >>>>>> configuration would be needed, but at this point I want to know if I am >>>>>> not completely mistaken when considering your sw for this. >>>>>> >>>>>> Thanks >>>>>> >>>>>> karel >>>>>> >>>>>> >>>>>> -- >>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> Karel Novotny >>>>>> Knowledge Sharing & Network Development Coordinator >>>>>> APC - The Association for Progressive Communications >>>>>> https://www.apc.org >>>>>> GSM: +420 605 243 246 (GMT +1) >>>>>> jabber: ka...@riseup.net >>>>>> Working/online: Monday - Thursday >>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search= >>>>>> 0x7FDEF502377E4FCA >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> http://home.apache.org/~lewismc/ >>>>> @hectorMcSpector >>>>> http://www.linkedin.com/in/lmcgibbney >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Karel Novotny >>> Knowledge Sharing & Network Development Coordinator >>> APC - The Association for Progressive Communications >>> https://www.apc.org <https://www.apc.org/> >>> GSM: +420 605 243 246 (GMT +1) >>> jabber: ka...@riseup.net <mailto:ka...@riseup.net> >>> Working/online: Monday - Thursday >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> My public OpenPGP key: >>> https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA >>> <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA> > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karel Novotny > Knowledge Sharing & Network Development Coordinator > APC - The Association for Progressive Communications > https://www.apc.org <https://www.apc.org/> > GSM: +420 605 243 246 (GMT +1) > jabber: ka...@riseup.net > Working/online: Monday - Thursday > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > My public OpenPGP key: > https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA > <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA>