Okay, I've fixed the nonbreaking_prefixes path issue.

The installation should now ignore your value of $JOSHUA entirely, preferring 
instead the bundled jar and scripts (maybe test this by unsetting $JOSHUA).

New version:

        http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-06.tgz 
<http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-06.tgz>

Please note: my tests show that using BerkeleyLM results in a notable drop in 
performance (1–2 BLEU points across many test sets). I am worried that we have 
introduced a bug in LanguageModelFF.java. We use BerkeleyLM so that users don't 
have to compile KenLM, but we're probably going to need to provide the option 
to "upgrade" for those willing to try to compile it. Or we'll need a solution 
for distributing pre-built KenLM shared libraries...

matt



> On Oct 5, 2016, at 11:43 PM, John Hewitt <john...@seas.upenn.edu> wrote:
> 
> Quick further note -- I already had $JOSHUA set to a different directory,
> so initially all the lookups were failing.
> 
> It's possible current users of JOSHUA will as well when they download new
> language packs. This should be an obvious and quick fix for the user, but I
> don't know if there's something we could do in the name of making it even
> clearer. (Potentially checking whether $JOSHUA is the same as $PWD after
> the directory change in prepare.sh, and printing a warning if it's not?)
> 
> -John
> 
> On Wed, Oct 5, 2016 at 11:32 PM, John Hewitt <john...@seas.upenn.edu> wrote:
> 
>> Thanks, Matt!
>> 
>> Some notes:
>> 
>> When piping input into prepare.sh, I get the following output:
>> 
>> WARNING: No known abbreviations for language 'es', attempting fall-back to
>> English version...
>> ERROR: No abbreviations files found in /nlp/users/johnhew/apache-
>> joshua-es-en-2016-10-05/scripts/preparation/nonbre
>> aking_prefixes
>> 
>> Seems that line 12 of tokenize.pl:
>> my $mydir = "$ENV{JOSHUA}/scripts/preparation/nonbreaking_prefixes";
>> should be:
>> my $mydir = "$ENV{JOSHUA}/scripts/nonbreaking_prefixes";
>> 
>> When I make this modification, it works just fine for me.
>> Also, tried in server mode -- seems to work without issue.
>> 
>> (For reference -- executed on an openSUSE cluster)
>> 
>> -John
>> 
>> 
>> 
>> On Wed, Oct 5, 2016 at 10:36 PM, Matt Post <p...@cs.jhu.edu> wrote:
>> 
>>> Hi folks,
>>> 
>>> I have managed to assemble an actual working language pack. Consider this
>>> a (near-final, I hope) draft of what we're rolling out for lots of
>>> languages. Please download it, check out the README and associated files,
>>> test it, and let me know what's missing or what needs to change.
>>> 
>>>        http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-05.tgz
>>> <http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-05.tgz> (2.1
>>> GB)
>>> 
>>> Suggested use:
>>> 
>>>        tar xzvf apache-joshua-es-en-2016-10-05.tgz
>>>        echo "\"Yo quiero Taco Bell,\", él dijo." \
>>>                | ./apache-joshua-es-en-2016-10-05/prepare.sh \
>>>                | ./apache-joshua-es-en-2016-10-05/joshua
>>> 
>>> matt
>> 
>> 
>> 

Reply via email to