Re: switching from Hunspell to Morfologik
For most of those languages, the frequency files I made are a lot more extensive than those on the gaia site. If you need them, just tell me. I can easily convert my frequency list to the gaia format. Ruud Hi, to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). For that, I think the following languages are easy to switch: Asturian Galician Khmer Spanish Tagalog Esperanto Icelandic Does anybody see a problem with me switching those languages to Morfologik? For Esperanto and Icelandic this would also have the benefit that they can then offer suggestions for typos. Does anybody see a problem with that? Other languages (fr, de, sv, pt-BR) might not be easy to switch, please see https://github.com/languagetool-org/languagetool/issues/199 for details. Regards Daniel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: switching from Hunspell to Morfologik
Hi Daniel, you may want to check out Hunspell's support in Lucene, it's purely Java-based. https://lucene.apache.org/core/4_10_1/analyzers-common/org/apache/lucene/analysis/hunspell/package-summary.html Perhaps it'll help you ease the transition (or avoid it completely?). Dawid On Sat, Oct 11, 2014 at 12:00 PM, Daniel Naber daniel.na...@languagetool.org wrote: Hi, to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). For that, I think the following languages are easy to switch: Asturian Galician Khmer Spanish Tagalog Esperanto Icelandic Does anybody see a problem with me switching those languages to Morfologik? For Esperanto and Icelandic this would also have the benefit that they can then offer suggestions for typos. Does anybody see a problem with that? Other languages (fr, de, sv, pt-BR) might not be easy to switch, please see https://github.com/languagetool-org/languagetool/issues/199 for details. Regards Daniel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: switching from Hunspell to Morfologik
On 2014-10-11 13:57, Dawid Weiss wrote: you may want to check out Hunspell's support in Lucene, it's purely Java-based. https://lucene.apache.org/core/4_10_1/analyzers-common/org/apache/lucene/analysis/hunspell/package-summary.html Perhaps it'll help you ease the transition (or avoid it completely?). It seems to support only stemming, not spell checking and creating suggestions, unless I'm missing something. Regards Daniel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: switching from Hunspell to Morfologik
That's right, I didn't know what is was you exactly needed. D. On Sat, Oct 11, 2014 at 3:15 PM, Daniel Naber daniel.na...@languagetool.org wrote: On 2014-10-11 13:57, Dawid Weiss wrote: you may want to check out Hunspell's support in Lucene, it's purely Java-based. https://lucene.apache.org/core/4_10_1/analyzers-common/org/apache/lucene/analysis/hunspell/package-summary.html Perhaps it'll help you ease the transition (or avoid it completely?). It seems to support only stemming, not spell checking and creating suggestions, unless I'm missing something. Regards Daniel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: switching from Hunspell to Morfologik
Hi, I tried creating an fsa dictionary with frequency information from my German word list[1] as detailed on the Wiki[2]. The creation process worked alright apparently, but when I tried to dump the binary dictionary to a list with the command java -cp languagetool.jar org.languagetool.dev.DictionaryExporter de-DE.dict I got the following error in LT 2.7: Unhandled program error occurred. Invoke with '--help' for help. java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkBounds(Buffer.java:559) at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:181) at morfologik.stemming.DictionaryLookup.decodeBaseForm(DictionaryLookup.java:312) at morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:95) at morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:15) at morfologik.tools.FSADumpTool.dump(FSADumpTool.java:171) at morfologik.tools.FSADumpTool.go(FSADumpTool.java:75) at morfologik.tools.Tool.go(Tool.java:45) at morfologik.tools.FSADumpTool.main(FSADumpTool.java:285) at org.languagetool.dev.DictionaryExporter.main(DictionaryExporter.java:41) The binary dictionary with frequency data that I created can be found at [3]. Best, Jan [1] http://sourceforge.net/projects/germandict/ [2] http://wiki.languagetool.org/hunspell-support [3] http://sourceforge.net/projects/germandict/files/Morfologik/ Am 11.10.2014 12:00, schrieb Daniel Naber: Hi, to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). For that, I think the following languages are easy to switch: Asturian Galician Khmer Spanish Tagalog Esperanto Icelandic Does anybody see a problem with me switching those languages to Morfologik? For Esperanto and Icelandic this would also have the benefit that they can then offer suggestions for typos. Does anybody see a problem with that? Other languages (fr, de, sv, pt-BR) might not be easy to switch, please see https://github.com/languagetool-org/languagetool/issues/199 for details. Regards Daniel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: switching from Hunspell to Morfologik
Hi, I wonder if we could use the switch to Morfologik as an opportunity to rethink our general approach to dictionaries. Currently we use two dictionaries for all the fully supported languages afaik, and those contribute considerably to the large download size of LanguageTool. Why not use just one dictionary per language and keep all the necessary data in one well organised place? This large word database could contain everything we need: tags and base form for the grammar checking, frequency information for the spelling suggestions, just about everything. Even if we want the dictionary to contain incorrect/outdated spellings for tagging purposes, all we need is a one-bit flag that tells the spell-checking routine if a word is misspelled. Cheers, Jan Am 11.10.2014 12:00, schrieb Daniel Naber: Hi, to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel