FYI, I stress-tested the Joshua server with the following protocol: for both
the TCP and HTTP servers, I started a six-thread server, and then sent five
simultaneous 16k documents at each. The translation times were as follows:
TCP: (times: 8:07 8:06 8:06)
for x in 1 2 3 4; do for num in $(seq 1 5); do cat corpus.es | nc
localhost 5674 > t.tcp.$num & done; time wait; done)
HTTP: (times: 7:25 7:34 7:20)
for x in 1 2 3 4; do for num in $(seq 1 5); do
/home/hltcoe/mpost/code/joshua/scripts/support/query_http.py -s localhost -p
5674 corpus.es > t.out.$num & done; time wait; done
The HTTP query takes 100 lines of the test set at a time, constructs the
RESTful query string (with 100 url-encoded "q=..." lines), and sends it to the
server.
So the bottom line is that the HTTP server both has an extended
Google-translate API (which also supports other things like adding rules) and
is a bit faster.
I'm documenting the RESTful API here:
https://cwiki.apache.org/confluence/display/JOSHUA/RESTful+API
matt
> On Mar 3, 2017, at 11:24 AM, Matt Post <[email protected]> wrote:
>
> Folks,
>
> I've updated the code with a few changes that will support Dockerized
> language packs. The nice thing is that this makes it easy to include KenLM.
>
> Here are some changes that were made:
>
> - Joshua now notes what directory the config file was found in and loads
> relative paths found in the config file relative to that directory
> automatically. This means you don't have to "cd" to the LP (language pack)
> directory before running Joshua.
>
> - I fixed the HTTP server to take multiple "q=" lines, just like the Google
> translate API. Before, they only took one "q=" line. This should mean (I'll
> test later today) that the HTTP server can handle throughput essentially at
> the rates of the TCP server.
>
> - I added (but haven't pushed yet) the KenLM model files to the language
> packs. In addition, I added a file "joshua.config.kenlm". These are not used
> except by Docker.
>
> - I fixed the docker setup. See the new file:
>
>
> https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile
>
> <https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile>
>
> This docker container builds KenLM. It then expects to be run with docker
> mounting an existing language pack to /model. It then runs the
> joshua.config.kenlm file, running it as a server in HTTP mode. See the README
> file for information:
>
>
> https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm
>
> <https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm>
>
> If anyone wants to test this out, please do. You can grab an updated language
> pack (version 3) here:
>
>
> http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz
> <http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz>
>
> (Warning: 9 GB)
>
> matt
>
>
>> On Nov 23, 2016, at 10:14 AM, kellen sunderland
>> <[email protected]> wrote:
>>
>> Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05'
>> then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or
>> something similar. I think the default command should eventually be to run
>> the http server, so ideally we'd just do 'docker run -p 5674
>> kellens/apache-joshua-es-en-2016-10-05' and that would start up the http
>> server on port 5674.
>>
>> Good point on Perl + Python, I can add them.
>>
>> -Kellen
>>
>> On Wed, Nov 23, 2016 at 3:22 PM, Matt Post <[email protected]> wrote:
>>
>>> Okay, I have this with
>>>
>>> docker run -it kellens/apache-joshua-es-en-2016-10-05 bash
>>>
>>> It seems we are missing Perl (./prepare.sh fails), and we should replace
>>> the LanguageModel line with a KenLM instance and build that. I bet we'll
>>> need Python, too.
>>>
>>>
>>>
>>>
>>>> On Nov 23, 2016, at 8:15 AM, Matt Post <[email protected]> wrote:
>>>>
>>>> Kellen, can I bother you to post a few first steps? I've successfully
>>> pulled this down to my mac but now do not know how to find it, edit it, or
>>> run it. I'm porting through the documentation and will find it eventually
>>> but this would save me a bit of time.
>>>>
>>>>
>>>>> On Nov 23, 2016, at 8:07 AM, kellen sunderland <
>>> [email protected]> wrote:
>>>>>
>>>>> Yes my next step was going to be getting it hosted officially.
>>>>>
>>>>> I'll go ahead and open a ticket. I think I'll hold off on pushing to
>>> the
>>>>> Apache account until I've done a little more testing though.
>>>>>
>>>>> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" <[email protected]>
>>> wrote:
>>>>>
>>>>>> Hi Kellen,
>>>>>> Nice :)
>>>>>> Another option is for us to host these via the Apache account.
>>>>>> https://hub.docker.com/r/apache/
>>>>>> We could then add a badge to our README which points to the
>>> Dockerfile(s).
>>>>>> Do you want to open a ticket over on the INFRA Jira for this?
>>>>>>
>>>>>> On Tue, Nov 22, 2016 at 1:57 PM, <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> From: kellen sunderland <[email protected]>
>>>>>>> To: "[email protected]" <[email protected].
>>> org>
>>>>>>> Cc:
>>>>>>> Date: Tue, 22 Nov 2016 22:56:56 +0100
>>>>>>> Subject: Re: Dockerhub hosted images
>>>>>>> Ok, the first image should be properly uploaded now.
>>>>>>>
>>>>>>> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
>>>>>>>
>>>>>>> -Kellen
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>>
>