Karel,

I'm way overdue on this email, so perhaps you've dropped this entirely, but I 
thought I'd respond to these points (inline below).

> On Jan 19, 2017, at 7:18 PM, Karel Novotný <ka...@apc.org 
> <mailto:ka...@apc.org>> wrote:
> 
> 
> 
> On 19.1.2017 15:15, Matt Post wrote:
>> Karel — On this point, I don't think you should have to use the tutorials, 
>> which tell you how to identify training data and build new translation 
>> models yourself. I imagine that you would be more interested in downloading 
>> pre-built models that don't really require you to be an expert in MT. See 
>> this page:
>> 
>>      https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs 
>> <https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs>
> 
> Thanks Matt for clarifications: Actually did download the language pairs
> yesterday and tried to run them to test the webapp by doing:
> 
> ./joshua -server-port 5674 -server-type http
> and
> firefox "web/index.html?server=localhost&port=5674"
> 
> However, it started consuming more and more memory until it jammed my
> computer completely (dual core 8GB ram). It might have been some bad
> config on my side though, or some other omission.

I don't remember what model you were using, but the model size is going to be 
roughly proportional to a "du -sh model/" in the language pack directory.

There is now a Dockerized Joshua that makes it easy to use KenLM, which reduces 
these requirements quite a bit.

We could also just build smaller models. If nothing else, to start out with, 
and then improve on later.


> Our sysadmin should be able to make use of the API you mentioned.
> 
> If all sentences must be sent separate.... Then I suppose that there is
> no way that we would automatically re-compose any formatting
> (paragraphs), right? Having translated text in one big block or as
> separate phrases on separate lines might make translating of messages a
> bit challenging.

I would think this could be recomposed rather easily, but yes, it would take 
some bookkeeping. What we really want is a tool that could wrap Joshua and 
manage this for us — take a document, extract the sentences, get the 
translations (as generic annotations, perhaps), substitute them back in, and 
then return the document. Doesn't Tika do this, to an an extent?


> As for the volume.... While this is difficult to estimate, I've made a
> calculation based on monthly volume in list archives in the absolute
> peak month. The average per day is approx 1000 sentences, so it might be
> around 3000 in peak days.

This is nothing — minutes of computing, at best, and there are knobs you can 
turn to change this.


> thanks for your interest in this.
> 
> karel
> 
>> 
>> matt
>> 
>> 
>>> On Jan 17, 2017, at 12:07 PM, lewis john mcgibbney <lewi...@apache.org 
>>> <mailto:lewi...@apache.org>> wrote:
>>> 
>>> Hi Karel,
>>> The short answer is yes.
>>> I would advise you to start at the Tutorial
>>> https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started 
>>> <https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started>
>>> If you find anything which causes you problems then please write back here.
>>> Once you have skipped through the tutorial then you will have a much better
>>> feel for the workflow required.
>>> I can see the Apache Tika language identification and translate API's being
>>> of particular use here when considered in a runtime context. We have a
>>> Joshua implementation over in Tika which can aid you in this task however
>>> try the Joshua tutorial first.
>>> Lewis
>>> 
>>> On Mon, Jan 16, 2017 at 7:41 AM, Chris Mattmann <mattm...@apache.org> wrote:
>>> 
>>>> Hi Karel,
>>>> 
>>>> I would recommend moving this thread to dev@joshua.incubator.apache.org
>>>> instead of the private list. I’ve moved private to BCC.
>>>> 
>>>> Thank you.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> 
>>>> 
>>>> On 1/16/17, 6:58 AM, wrote:
>>>> 
>>>>   Hello,
>>>> 
>>>>   We would like to build a self-hosted machine translation system that
>>>>   could be plugged into our mailman installs. The objective is that the
>>>>   members of our multicultural network would be able to send email in
>>>>   their mother language and it would be delivered to the list
>>>>   machine-translated (and vise versa).
>>>> 
>>>>   Are we on the right track with Joshua? I suppose that a lot of
>>>>   configuration would be needed, but at this point I want to know if I am
>>>>   not completely mistaken when considering your sw for this.
>>>> 
>>>>   Thanks
>>>> 
>>>>   karel
>>>> 
>>>> 
>>>>   --
>>>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>   Karel Novotny
>>>>   Knowledge Sharing & Network Development Coordinator
>>>>   APC - The Association for Progressive Communications
>>>>   https://www.apc.org
>>>>   GSM: +420 605 243 246 (GMT +1)
>>>>   jabber: ka...@riseup.net
>>>>   Working/online: Monday - Thursday
>>>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>   My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search=
>>>> 0x7FDEF502377E4FCA
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> http://home.apache.org/~lewismc/
>>> @hectorMcSpector
>>> http://www.linkedin.com/in/lmcgibbney
>> 
> 
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Karel Novotny 
> Knowledge Sharing & Network Development Coordinator
> APC - The Association for Progressive Communications 
> https://www.apc.org <https://www.apc.org/>
> GSM: +420 605 243 246 (GMT +1)
> jabber: ka...@riseup.net <mailto:ka...@riseup.net>
> Working/online: Monday - Thursday
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> My public OpenPGP key: 
> https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA 
> <https://pgp.mit.edu/pks/lookup?op=get&search=0x7FDEF502377E4FCA>

Reply via email to