This is a valid FSA file, but not a valid encoding for the dictionary you're trying to dump, Jan. That's why you're getting an exception. For example this entry:
AAA+I with SUFFIX encoder (which your .info file implicitly picks) this means to truncate 8 bytes from the sequence, which is clearly wrong. It seems to me that you have data that shouldn't be encoded with anything (and isn't) -- perhaps the LT colleagues can follow-up with this one. The wiki page at: http://wiki.languagetool.org/hunspell-support indeed should clarify the encoder property for the associated .info file as: fsa.dict.encoder=NONE if you comment out these obsolete properties from your .info file: #fsa.dict.uses-prefixes=false #fsa.dict.uses-infixes=false and add the above one, the dictionary dumps just fine. In any case, you can always dump *any* FSA dictionary without applying the decoding routines; just use: java -jar morfologik-tools-1.10.0-SNAPSHOT-standalone.jar fsa_dump -d <dict> --raw-data If you do want to "decode" the data, pass an additional "-x", although if the underlying data doesn't make sense, exceptions may occur (no runtime checks are done to verify sanity for performance reasons). Dawid On Mon, Oct 13, 2014 at 3:59 PM, Jan Schreiber <jan.schrei...@languagetool.org> wrote: > In case anyone's interested in the exported plain text file, it is here: > http://sourceforge.net/projects/germandict/files/Morfologik/de_frequency.7z > > I sorted the words by frequency class and additionally sorted the > largest "A" class of least frequent words by word length. > > The frequency distribution for the first 200,000 words looks fairly > plausible, but the vast majority (about 1.4 million word forms) is > lumped together in one huge class. > > Ruud, you said you have larger frequency data sets available for most of > the languages. If you happen to have data for German available I would > love to have it, ideally in the gaia format so I don't have to hassle > with converting it. But a tab-separated list or something like that > would also be great. > > --Jan > > Am 12.10.2014 18:18, schrieb Jan Schreiber: >> I figured out how to dump the dictionary. All I had to do was create a >> hunspell subfolder and move the binary dictionary into it, then the >> exporting process worked as advertised. > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://p.sf.net/sfu/Zoho > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel