That would be awesome. matt
> On Oct 7, 2016, at 11:49 AM, kellen sunderland <kellen.sunderl...@gmail.com> > wrote: > > I was actually going to try and build KenLM into a maven package that can > be easily distributed. I haven't had time to work on it too much but I > think it shouldn't be too hard. > > On Thu, Oct 6, 2016 at 4:16 PM, Matt Post <p...@cs.jhu.edu> wrote: > >> Okay, I've fixed the nonbreaking_prefixes path issue. >> >> The installation should now ignore your value of $JOSHUA entirely, >> preferring instead the bundled jar and scripts (maybe test this by >> unsetting $JOSHUA). >> >> New version: >> >> http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-06.tgz < >> http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-06.tgz> >> >> Please note: my tests show that using BerkeleyLM results in a notable drop >> in performance (1–2 BLEU points across many test sets). I am worried that >> we have introduced a bug in LanguageModelFF.java. We use BerkeleyLM so that >> users don't have to compile KenLM, but we're probably going to need to >> provide the option to "upgrade" for those willing to try to compile it. Or >> we'll need a solution for distributing pre-built KenLM shared libraries... >> >> matt >> >> >> >>> On Oct 5, 2016, at 11:43 PM, John Hewitt <john...@seas.upenn.edu> wrote: >>> >>> Quick further note -- I already had $JOSHUA set to a different directory, >>> so initially all the lookups were failing. >>> >>> It's possible current users of JOSHUA will as well when they download new >>> language packs. This should be an obvious and quick fix for the user, >> but I >>> don't know if there's something we could do in the name of making it even >>> clearer. (Potentially checking whether $JOSHUA is the same as $PWD after >>> the directory change in prepare.sh, and printing a warning if it's not?) >>> >>> -John >>> >>> On Wed, Oct 5, 2016 at 11:32 PM, John Hewitt <john...@seas.upenn.edu> >> wrote: >>> >>>> Thanks, Matt! >>>> >>>> Some notes: >>>> >>>> When piping input into prepare.sh, I get the following output: >>>> >>>> WARNING: No known abbreviations for language 'es', attempting fall-back >> to >>>> English version... >>>> ERROR: No abbreviations files found in /nlp/users/johnhew/apache- >>>> joshua-es-en-2016-10-05/scripts/preparation/nonbre >>>> aking_prefixes >>>> >>>> Seems that line 12 of tokenize.pl: >>>> my $mydir = "$ENV{JOSHUA}/scripts/preparation/nonbreaking_prefixes"; >>>> should be: >>>> my $mydir = "$ENV{JOSHUA}/scripts/nonbreaking_prefixes"; >>>> >>>> When I make this modification, it works just fine for me. >>>> Also, tried in server mode -- seems to work without issue. >>>> >>>> (For reference -- executed on an openSUSE cluster) >>>> >>>> -John >>>> >>>> >>>> >>>> On Wed, Oct 5, 2016 at 10:36 PM, Matt Post <p...@cs.jhu.edu> wrote: >>>> >>>>> Hi folks, >>>>> >>>>> I have managed to assemble an actual working language pack. Consider >> this >>>>> a (near-final, I hope) draft of what we're rolling out for lots of >>>>> languages. Please download it, check out the README and associated >> files, >>>>> test it, and let me know what's missing or what needs to change. >>>>> >>>>> http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10- >> 05.tgz >>>>> <http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-05.tgz> >> (2.1 >>>>> GB) >>>>> >>>>> Suggested use: >>>>> >>>>> tar xzvf apache-joshua-es-en-2016-10-05.tgz >>>>> echo "\"Yo quiero Taco Bell,\", él dijo." \ >>>>> | ./apache-joshua-es-en-2016-10-05/prepare.sh \ >>>>> | ./apache-joshua-es-en-2016-10-05/joshua >>>>> >>>>> matt >>>> >>>> >>>> >> >>