That would be awesome.

matt


> On Oct 7, 2016, at 11:49 AM, kellen sunderland <kellen.sunderl...@gmail.com> 
> wrote:
> 
> I was actually going to try and build KenLM into a maven package that can
> be easily distributed.  I haven't had time to work on it too much but I
> think it shouldn't be too hard.
> 
> On Thu, Oct 6, 2016 at 4:16 PM, Matt Post <p...@cs.jhu.edu> wrote:
> 
>> Okay, I've fixed the nonbreaking_prefixes path issue.
>> 
>> The installation should now ignore your value of $JOSHUA entirely,
>> preferring instead the bundled jar and scripts (maybe test this by
>> unsetting $JOSHUA).
>> 
>> New version:
>> 
>>        http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-06.tgz <
>> http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-06.tgz>
>> 
>> Please note: my tests show that using BerkeleyLM results in a notable drop
>> in performance (1–2 BLEU points across many test sets). I am worried that
>> we have introduced a bug in LanguageModelFF.java. We use BerkeleyLM so that
>> users don't have to compile KenLM, but we're probably going to need to
>> provide the option to "upgrade" for those willing to try to compile it. Or
>> we'll need a solution for distributing pre-built KenLM shared libraries...
>> 
>> matt
>> 
>> 
>> 
>>> On Oct 5, 2016, at 11:43 PM, John Hewitt <john...@seas.upenn.edu> wrote:
>>> 
>>> Quick further note -- I already had $JOSHUA set to a different directory,
>>> so initially all the lookups were failing.
>>> 
>>> It's possible current users of JOSHUA will as well when they download new
>>> language packs. This should be an obvious and quick fix for the user,
>> but I
>>> don't know if there's something we could do in the name of making it even
>>> clearer. (Potentially checking whether $JOSHUA is the same as $PWD after
>>> the directory change in prepare.sh, and printing a warning if it's not?)
>>> 
>>> -John
>>> 
>>> On Wed, Oct 5, 2016 at 11:32 PM, John Hewitt <john...@seas.upenn.edu>
>> wrote:
>>> 
>>>> Thanks, Matt!
>>>> 
>>>> Some notes:
>>>> 
>>>> When piping input into prepare.sh, I get the following output:
>>>> 
>>>> WARNING: No known abbreviations for language 'es', attempting fall-back
>> to
>>>> English version...
>>>> ERROR: No abbreviations files found in /nlp/users/johnhew/apache-
>>>> joshua-es-en-2016-10-05/scripts/preparation/nonbre
>>>> aking_prefixes
>>>> 
>>>> Seems that line 12 of tokenize.pl:
>>>> my $mydir = "$ENV{JOSHUA}/scripts/preparation/nonbreaking_prefixes";
>>>> should be:
>>>> my $mydir = "$ENV{JOSHUA}/scripts/nonbreaking_prefixes";
>>>> 
>>>> When I make this modification, it works just fine for me.
>>>> Also, tried in server mode -- seems to work without issue.
>>>> 
>>>> (For reference -- executed on an openSUSE cluster)
>>>> 
>>>> -John
>>>> 
>>>> 
>>>> 
>>>> On Wed, Oct 5, 2016 at 10:36 PM, Matt Post <p...@cs.jhu.edu> wrote:
>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> I have managed to assemble an actual working language pack. Consider
>> this
>>>>> a (near-final, I hope) draft of what we're rolling out for lots of
>>>>> languages. Please download it, check out the README and associated
>> files,
>>>>> test it, and let me know what's missing or what needs to change.
>>>>> 
>>>>>       http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-
>> 05.tgz
>>>>> <http://cs.jhu.edu/~post/files/apache-joshua-es-en-2016-10-05.tgz>
>> (2.1
>>>>> GB)
>>>>> 
>>>>> Suggested use:
>>>>> 
>>>>>       tar xzvf apache-joshua-es-en-2016-10-05.tgz
>>>>>       echo "\"Yo quiero Taco Bell,\", él dijo." \
>>>>>               | ./apache-joshua-es-en-2016-10-05/prepare.sh \
>>>>>               | ./apache-joshua-es-en-2016-10-05/joshua
>>>>> 
>>>>> matt
>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to