the file indeed exists. less mode/lm.berkelylm warns about a binary file, the first bytes follow:
<AC><ED>^@^Esr^@-edu.berkeley.nlp.lm.ArrayEncodedProbBackoffLm ^@^@^@^@^@^@^@^A^B^@^DJ^@^HnumWordsZ^@^PuseScratchValuesL^@^Cmapt^@ "Ledu/berkeley/nl p/lm/map/NgramMap;L^@^Fvaluest^@ 6Ledu/berkeley/nlp/lm/values/ProbBackoffValueContainer;xr^@ :edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageM odel^@^@^@^@^@^@^@^A^B^@^@xr^@ .edu.berkeley.nlp.lm.AbstractNgramLanguageModel^@^@^@^@^@^@^@^A^B^@^CI^@^G lmOrderF^@^NoovWordLogProbL^@^KwordIndexer I am not doing anything specific in the code, just instantiate the Decoder from the config file, which comes from the language pack, is there any option to explicitly tell that it's a binary and not a text ARPA file ? Thanks Kellen and Matt for your prompt replies. Regards, Tommaso Il giorno lun 16 ott 2017 alle ore 20:35 Matt Post <p...@cs.jhu.edu> ha scritto: > First I'd check, does the file exist? > > It shouldn't be calling ArpaLM. That's for loading plain text files. > ".berkeleylm" files have been compiled into a special binary format that is > more efficiently compacted and can be ready quickly. There is logic for > determining which type of file it is, and I wonder if it is going astray. > Or maybe the file is not what it says it is (can you "head" it)? > > matt > > > > On Oct 16, 2017, at 7:08 PM, kellen sunderland < > kellen.sunderl...@gmail.com> wrote: > > > > The feature function initialization message is just a general purpose > exception handler. I’ve seen this quite often when language models fail to > load. The most interesting part of the log to me is: > > > >> Caused by: java.lang.RuntimeException: Something wrong with I/O. > >> > >> at edu.berkeley.nlp.lm.io > .ArpaLmReader.parseHeader(ArpaLmReader.java:114) > >> > >> at edu.berkeley.nlp.lm.io.ArpaLmReader.parse(ArpaLmReader.java:76) > > > > > > To me it looks like it could only be caused by the lack of the text > "\\1-grams:" in the file you’re opening. Reference this function: > https://github.com/smilli/berkeleylm/blob/master/src/edu/berkeley/nlp/lm/io/ArpaLmReader.java#L105 > > > > Are you trying to load a binary lm with an Arpa reader by any chance? > Do you have the quoted text in your text based LM? > > > > -Kellen > > From: Tommaso Teofili > > Sent: Monday, October 16, 2017 4:09 PM > > To: dev@joshua.incubator.apache.org > > Subject: Re: problems with LM loading > > > > p.s.: > > I've tried with other LPs (e.g. sd-en) and I get the same ... > > > > Il giorno lun 16 ott 2017 alle ore 15:06 Tommaso Teofili < > > tommaso.teof...@gmail.com> ha scritto: > > > >> Hi all, > >> > >> I am trying to use the ES-EN language pack from our "Language Packs" > page > >> with Joshua 6.1, but when I get to load the two language models I get > an IO > >> execption. > >> The config looks like: > >> > >> feature-function = LanguageModel -lm_type berkeleylm -lm_order 4 > -lm_file > >> model/lm.berkeleylm > >> feature-function = Distortion > >> feature-function = LanguageModel -lm_type berkeleylm -lm_order 4 > -lm_file > >> model/en.giga.twopercent.4.lm.berkeleylm > >> feature-function = PhrasePenalty > >> > >> and I get the following: > >> > >> java.lang.RuntimeException: java.lang.RuntimeException: Unable to > >> instantiate feature function 'LanguageModel -lm_type berkeleylm > -lm_order 4 > >> -lm_file model/lm.berkeleylm'! > >> > >> ... > >> > >> Caused by: java.lang.RuntimeException: Unable to instantiate feature > >> function 'LanguageModel -lm_type berkeleylm -lm_order 4 -lm_file > >> model/lm.berkeleylm'! > >> > >> at > >> > org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:642) > >> > >> at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:394) > >> > >> at org.apache.joshua.decoder.Decoder.<init>(Decoder.java:128) > >> > >> Caused by: java.lang.reflect.InvocationTargetException: null > >> > >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > >> > >> at > >> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > >> > >> at > >> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > >> > >> at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > >> > >> at > >> > org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(Decoder.java:638) > >> > >> ... 58 common frames omitted > >> > >> Caused by: java.lang.RuntimeException: Something wrong with I/O. > >> > >> at edu.berkeley.nlp.lm.io > .ArpaLmReader.parseHeader(ArpaLmReader.java:114) > >> > >> at edu.berkeley.nlp.lm.io.ArpaLmReader.parse(ArpaLmReader.java:76) > >> > >> at edu.berkeley.nlp.lm.io.ArpaLmReader.parse(ArpaLmReader.java:18) > >> > >> at edu.berkeley.nlp.lm.io.LmReaders.firstPassCommon(LmReaders.java:549) > >> > >> at edu.berkeley.nlp.lm.io.LmReaders.firstPassArpa(LmReaders.java:526) > >> > >> at > >> edu.berkeley.nlp.lm.io > .LmReaders.readArrayEncodedLmFromArpa(LmReaders.java:171) > >> > >> at > >> edu.berkeley.nlp.lm.io > .LmReaders.readArrayEncodedLmFromArpa(LmReaders.java:151) > >> > >> at > >> > org.apache.joshua.decoder.ff.lm.berkeley_lm.LMGrammarBerkeley.<init>(LMGrammarBerkeley.java:94) > >> > >> at > >> > org.apache.joshua.decoder.ff.lm.LanguageModelFF.initializeLM(LanguageModelFF.java:158) > >> > >> at > >> > org.apache.joshua.decoder.ff.lm.LanguageModelFF.<init>(LanguageModelFF.java:132) > >> > >> Any hints on what I could be doing wrong ? Encoding ? > >> Did anyone else experience such issue ? > >> > >> BTW I am running this from within a Java application, Decoder is > >> initialized as follows: > >> > >> JoshuaConfiguration configuration = new JoshuaConfiguration(); > >> configuration.readConfigFile(pathToJoshuaConfig); > >> configuration.use_structured_output = true; > >> Decoder decoder = new Decoder(configuration, pathToJoshuaConfig); > >> > >> Regards, > >> Tommaso > >> > > > >