Karel, I'm way overdue on this email, so perhaps you've dropped this entirely, but I thought I'd respond to these points (inline below).
> On Jan 19, 2017, at 7:18 PM, Karel Novotný <ka...@apc.org > <mailto:ka...@apc.org>> wrote: > > > > On 19.1.2017 15:15, Matt Post wrote: >> Karel — On this point, I don't think you should have to use the tutorials, >> which tell you how to identify training data and build new translation >> models yourself. I imagine that you would be more interested in downloading >> pre-built models that don't really require you to be an expert in MT. See >> this page: >> >> https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs >> <https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs> > > Thanks Matt for clarifications: Actually did download the language pairs > yesterday and tried to run them to test the webapp by doing: > > ./joshua -server-port 5674 -server-type http > and > firefox "web/index.html?server=localhost&port=5674" > > However, it started consuming more and more memory until it jammed my > computer completely (dual core 8GB ram). It might have been some bad > config on my side though, or some other omission. I don't remember what model you were using, but the model size is going to be roughly proportional to a "du -sh model/" in the language pack directory. There is now a Dockerized Joshua that makes it easy to use KenLM, which reduces these requirements quite a bit. We could also just build smaller models. If nothing else, to start out with, and then improve on later. > Our sysadmin should be able to make use of the API you mentioned. > > If all sentences must be sent separate.... Then I suppose that there is > no way that we would automatically re-compose any formatting > (paragraphs), right? Having translated text in one big block or as > separate phrases on separate lines might make translating of messages a > bit challenging. I would think this could be recomposed rather easily, but yes, it would take some bookkeeping. What we really want is a tool that could wrap Joshua and manage this for us — take a document, extract the sentences, get the translations (as generic annotations, perhaps), substitute them back in, and then return the document. Doesn't Tika do this, to an an extent? > As for the volume.... While this is difficult to estimate, I've made a > calculation based on monthly volume in list archives in the absolute > peak month. The average per day is approx 1000 sentences, so it might be > around 3000 in peak days. This is nothing — minutes of computing, at best, and there are knobs you can turn to change this. > thanks for your interest in this. > > karel > >> >> matt >> >> >>> On Jan 17, 2017, at 12:07 PM, lewis john mcgibbney <lewi...@apache.org >>> <mailto:lewi...@apache.org>> wrote: >>> >>> Hi Karel, >>> The short answer is yes. >>> I would advise you to start at the Tutorial >>> https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started >>> <https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started> >>> If you find anything which causes you problems then please write back here. >>> Once you have skipped through the tutorial then you will have a much better >>> feel for the workflow required. >>> I can see the Apache Tika language identification and translate API's being >>> of particular use here when considered in a runtime context. We have a >>> Joshua implementation over in Tika which can aid you in this task however >>> try the Joshua tutorial first. >>> Lewis >>> >>> On Mon, Jan 16, 2017 at 7:41 AM, Chris Mattmann <mattm...@apache.org> wrote: >>> >>>> Hi Karel, >>>> >>>> I would recommend moving this thread to dev@joshua.incubator.apache.org >>>> instead of the private list. I’ve moved private to BCC. >>>> >>>> Thank you. >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> On 1/16/17, 6:58 AM, wrote: >>>> >>>> Hello, >>>> >>>> We would like to build a self-hosted machine translation system that >>>> could be plugged into our mailman installs. The objective is that the >>>> members of our multicultural network would be able to send email in >>>> their mother language and it would be delivered to the list >>>> machine-translated (and vise versa). >>>> >>>> Are we on the right track with Joshua? I suppose that a lot of >>>> configuration would be needed, but at this point I want to know if I am >>>> not completely mistaken when considering your sw for this. >>>> >>>> Thanks >>>> >>>> karel >>>> >>>> >>>> -- >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> Karel Novotny >>>> Knowledge Sharing & Network Development Coordinator >>>> APC - The Association for Progressive Communications >>>> https://www.apc.org >>>> GSM: +420 605 243 246 (GMT +1) >>>> jabber: ka...@riseup.net >>>> Working/online: Monday - Thursday >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search= >>>> 0x7FDEF502377E4FCA >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> http://home.apache.org/~lewismc/ >>> @hectorMcSpector >>> http://www.linkedin.com/in/lmcgibbney >> > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Karel Novotny > Knowledge Sharing & Network Development Coordinator > APC - The Association for Progressive Communications > https://www.apc.org <https://www.apc.org/> > GSM: +420 605 243 246 (GMT +1) > jabber: ka...@riseup.net <mailto:ka...@riseup.net> > Working/online: Monday - Thursday > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > My public OpenPGP key: > https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA > <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA>