the file indeed exists.
less mode/lm.berkelylm warns about a binary file, the first bytes follow:

<AC><ED>^@^Esr^@-edu.berkeley.nlp.lm.ArrayEncodedProbBackoffLm
^@^@^@^@^@^@^@^A^B^@^DJ^@^HnumWordsZ^@^PuseScratchValuesL^@^Cmapt^@
"Ledu/berkeley/nl

p/lm/map/NgramMap;L^@^Fvaluest^@
6Ledu/berkeley/nlp/lm/values/ProbBackoffValueContainer;xr^@
:edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageM

odel^@^@^@^@^@^@^@^A^B^@^@xr^@
.edu.berkeley.nlp.lm.AbstractNgramLanguageModel^@^@^@^@^@^@^@^A^B^@^CI^@^G
lmOrderF^@^NoovWordLogProbL^@^KwordIndexer

I am not doing anything specific in the code, just instantiate the Decoder
from the config file, which comes from the language pack, is there any
option to explicitly tell that it's a binary and not a text ARPA file ?

Thanks Kellen and Matt for your prompt replies.

Regards,
Tommaso


Il giorno lun 16 ott 2017 alle ore 20:35 Matt Post <p...@cs.jhu.edu> ha
scritto:

> First I'd check, does the file exist?
>
> It shouldn't be calling ArpaLM. That's for loading plain text files.
> ".berkeleylm" files have been compiled into a special binary format that is
> more efficiently compacted and can be ready quickly. There is logic for
> determining which type of file it is, and I wonder if it is going astray.
> Or maybe the file is not what it says it is (can you "head" it)?
>
> matt
>
>
> > On Oct 16, 2017, at 7:08 PM, kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> >
> > The feature function initialization message is just a general purpose
> exception handler.  I’ve seen this quite often when language models fail to
> load.  The most interesting part of the log to me is:
> >
> >> Caused by: java.lang.RuntimeException: Something wrong with I/O.
> >>
> >> at edu.berkeley.nlp.lm.io
> .ArpaLmReader.parseHeader(ArpaLmReader.java:114)
> >>
> >> at edu.berkeley.nlp.lm.io.ArpaLmReader.parse(ArpaLmReader.java:76)
> >
> >
> > To me it looks like it could only be caused by the lack of the text
> "\\1-grams:" in the file you’re opening.  Reference this function:
> https://github.com/smilli/berkeleylm/blob/master/src/edu/berkeley/nlp/lm/io/ArpaLmReader.java#L105
> >
> > Are you trying to load a binary lm with an Arpa reader by any chance?
> Do you have the quoted text in your text based LM?
> >
> > -Kellen
> > From: Tommaso Teofili
> > Sent: Monday, October 16, 2017 4:09 PM
> > To: dev@joshua.incubator.apache.org
> > Subject: Re: problems with LM loading
> >
> > p.s.:
> > I've tried with other LPs (e.g. sd-en) and I get the same ...
> >
> > Il giorno lun 16 ott 2017 alle ore 15:06 Tommaso Teofili <
> > tommaso.teof...@gmail.com> ha scritto:
> >
> >> Hi all,
> >>
> >> I am trying to use the ES-EN language pack from our "Language Packs"
> page
> >> with Joshua 6.1, but when I get to load the two language models I get
> an IO
> >> execption.
> >> The config looks like:
> >>
> >> feature-function = LanguageModel -lm_type berkeleylm -lm_order 4
> -lm_file
> >> model/lm.berkeleylm
> >> feature-function = Distortion
> >> feature-function = LanguageModel -lm_type berkeleylm -lm_order 4
> -lm_file
> >> model/en.giga.twopercent.4.lm.berkeleylm
> >> feature-function = PhrasePenalty
> >>
> >> and I get the following:
> >>
> >> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
> >> instantiate feature function 'LanguageModel -lm_type berkeleylm
> -lm_order 4
> >> -lm_file model/lm.berkeleylm'!
> >>
> >> ...
> >>
> >> Caused by: java.lang.RuntimeException: Unable to instantiate feature
> >> function 'LanguageModel -lm_type berkeleylm -lm_order 4 -lm_file
> >> model/lm.berkeleylm'!
> >>
> >> at
> >>
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:642)
> >>
> >> at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:394)
> >>
> >> at org.apache.joshua.decoder.Decoder.<init>(Decoder.java:128)
> >>
> >> Caused by: java.lang.reflect.InvocationTargetException: null
> >>
> >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> >>
> >> at
> >>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> >>
> >> at
> >>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >>
> >> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> >>
> >> at
> >>
> org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:638)
> >>
> >> ... 58 common frames omitted
> >>
> >> Caused by: java.lang.RuntimeException: Something wrong with I/O.
> >>
> >> at edu.berkeley.nlp.lm.io
> .ArpaLmReader.parseHeader(ArpaLmReader.java:114)
> >>
> >> at edu.berkeley.nlp.lm.io.ArpaLmReader.parse(ArpaLmReader.java:76)
> >>
> >> at edu.berkeley.nlp.lm.io.ArpaLmReader.parse(ArpaLmReader.java:18)
> >>
> >> at edu.berkeley.nlp.lm.io.LmReaders.firstPassCommon(LmReaders.java:549)
> >>
> >> at edu.berkeley.nlp.lm.io.LmReaders.firstPassArpa(LmReaders.java:526)
> >>
> >> at
> >> edu.berkeley.nlp.lm.io
> .LmReaders.readArrayEncodedLmFromArpa(LmReaders.java:171)
> >>
> >> at
> >> edu.berkeley.nlp.lm.io
> .LmReaders.readArrayEncodedLmFromArpa(LmReaders.java:151)
> >>
> >> at
> >>
> org.apache.joshua.decoder.ff.lm.berkeley_lm.LMGrammarBerkeley.<init>(LMGrammarBerkeley.java:94)
> >>
> >> at
> >>
> org.apache.joshua.decoder.ff.lm.LanguageModelFF.initializeLM(LanguageModelFF.java:158)
> >>
> >> at
> >>
> org.apache.joshua.decoder.ff.lm.LanguageModelFF.<init>(LanguageModelFF.java:132)
> >>
> >> Any hints on what I could be doing wrong ? Encoding ?
> >> Did anyone else experience such issue ?
> >>
> >> BTW I am running this from within a Java application, Decoder is
> >> initialized as follows:
> >>
> >> JoshuaConfiguration configuration = new JoshuaConfiguration();
> >>    configuration.readConfigFile(pathToJoshuaConfig);
> >>    configuration.use_structured_output = true;
> >>    Decoder decoder = new Decoder(configuration, pathToJoshuaConfig);
> >>
> >> Regards,
> >> Tommaso
> >>
> >
>
>

Reply via email to