Re: switching from Hunspell to Morfologik

2014-10-11 Thread R.J. Baars
For most of those languages, the frequency files I made are a lot more
extensive than those on the gaia site.

If you need them, just tell me. I can easily convert my frequency list to
the gaia format.

Ruud

 Hi,

 to provide LT as a 100% pure Java software, I'd like to switch from
 Hunspell (native code) to Morfologik (Java-based). For that, I think the
 following languages are easy to switch:

  Asturian
  Galician
  Khmer
  Spanish
  Tagalog
  Esperanto
  Icelandic

 Does anybody see a problem with me switching those languages to
 Morfologik? For Esperanto and Icelandic this would also have the benefit
 that they can then offer suggestions for typos.

 Does anybody see a problem with that? Other languages (fr, de, sv,
 pt-BR) might not be easy to switch, please see
 https://github.com/languagetool-org/languagetool/issues/199 for details.

 Regards
   Daniel


 --
 Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
 Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
 Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
 Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
 http://p.sf.net/sfu/Zoho
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel




--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching from Hunspell to Morfologik

2014-10-11 Thread Dawid Weiss
Hi Daniel,

you may want to check out Hunspell's support in Lucene, it's purely Java-based.

https://lucene.apache.org/core/4_10_1/analyzers-common/org/apache/lucene/analysis/hunspell/package-summary.html

Perhaps it'll help you ease the transition (or avoid it completely?).

Dawid

On Sat, Oct 11, 2014 at 12:00 PM, Daniel Naber
daniel.na...@languagetool.org wrote:
 Hi,

 to provide LT as a 100% pure Java software, I'd like to switch from
 Hunspell (native code) to Morfologik (Java-based). For that, I think the
 following languages are easy to switch:

  Asturian
  Galician
  Khmer
  Spanish
  Tagalog
  Esperanto
  Icelandic

 Does anybody see a problem with me switching those languages to
 Morfologik? For Esperanto and Icelandic this would also have the benefit
 that they can then offer suggestions for typos.

 Does anybody see a problem with that? Other languages (fr, de, sv,
 pt-BR) might not be easy to switch, please see
 https://github.com/languagetool-org/languagetool/issues/199 for details.

 Regards
   Daniel


 --
 Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
 Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
 Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
 Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
 http://p.sf.net/sfu/Zoho
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching from Hunspell to Morfologik

2014-10-11 Thread Daniel Naber
On 2014-10-11 13:57, Dawid Weiss wrote:

 you may want to check out Hunspell's support in Lucene, it's purely 
 Java-based.
 
 https://lucene.apache.org/core/4_10_1/analyzers-common/org/apache/lucene/analysis/hunspell/package-summary.html
 
 Perhaps it'll help you ease the transition (or avoid it completely?).

It seems to support only stemming, not spell checking and creating 
suggestions, unless I'm missing something.

Regards
  Daniel


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching from Hunspell to Morfologik

2014-10-11 Thread Dawid Weiss
That's right, I didn't know what is was you exactly needed.

D.

On Sat, Oct 11, 2014 at 3:15 PM, Daniel Naber
daniel.na...@languagetool.org wrote:
 On 2014-10-11 13:57, Dawid Weiss wrote:

 you may want to check out Hunspell's support in Lucene, it's purely
 Java-based.

 https://lucene.apache.org/core/4_10_1/analyzers-common/org/apache/lucene/analysis/hunspell/package-summary.html

 Perhaps it'll help you ease the transition (or avoid it completely?).

 It seems to support only stemming, not spell checking and creating
 suggestions, unless I'm missing something.

 Regards
   Daniel


 --
 Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
 Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
 Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
 Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
 http://p.sf.net/sfu/Zoho
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching from Hunspell to Morfologik

2014-10-11 Thread Jan Schreiber
Hi,

I tried creating an fsa dictionary with frequency information from my
German word list[1] as detailed on the Wiki[2].

The creation process worked alright apparently, but when I tried to dump
the binary dictionary to a list with the command

java -cp languagetool.jar org.languagetool.dev.DictionaryExporter de-DE.dict

I got the following error in LT 2.7:

Unhandled program error occurred.
Invoke with '--help' for help.
java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(Buffer.java:559)
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:181)
at
morfologik.stemming.DictionaryLookup.decodeBaseForm(DictionaryLookup.java:312)
at 
morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:95)
at 
morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:15)
at morfologik.tools.FSADumpTool.dump(FSADumpTool.java:171)
at morfologik.tools.FSADumpTool.go(FSADumpTool.java:75)
at morfologik.tools.Tool.go(Tool.java:45)
at morfologik.tools.FSADumpTool.main(FSADumpTool.java:285)
at 
org.languagetool.dev.DictionaryExporter.main(DictionaryExporter.java:41)

The binary dictionary with frequency data that I created can be found at
[3].

Best,
Jan

[1] http://sourceforge.net/projects/germandict/
[2] http://wiki.languagetool.org/hunspell-support
[3] http://sourceforge.net/projects/germandict/files/Morfologik/

Am 11.10.2014 12:00, schrieb Daniel Naber:
 Hi,
 
 to provide LT as a 100% pure Java software, I'd like to switch from 
 Hunspell (native code) to Morfologik (Java-based). For that, I think the 
 following languages are easy to switch:
 
  Asturian
  Galician
  Khmer
  Spanish
  Tagalog
  Esperanto
  Icelandic
 
 Does anybody see a problem with me switching those languages to 
 Morfologik? For Esperanto and Icelandic this would also have the benefit 
 that they can then offer suggestions for typos.
 
 Does anybody see a problem with that? Other languages (fr, de, sv, 
 pt-BR) might not be easy to switch, please see 
 https://github.com/languagetool-org/languagetool/issues/199 for details.
 
 Regards
   Daniel
 
 
 --
 Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
 Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
 Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
 Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
 http://p.sf.net/sfu/Zoho
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel
 

--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching from Hunspell to Morfologik

2014-10-11 Thread Jan Schreiber
Hi,

I wonder if we could use the switch to Morfologik as an opportunity to
rethink our general approach to dictionaries.

Currently we use two dictionaries for all the fully supported languages
afaik, and those contribute considerably to the large download size of
LanguageTool.

Why not use just one dictionary per language and keep all the necessary
data in one well organised place? This large word database could contain
everything we need: tags and base form for the grammar checking,
frequency information for the spelling suggestions, just about
everything. Even if we want the dictionary to contain incorrect/outdated
spellings for tagging purposes, all we need is a one-bit flag that tells
the spell-checking routine if a word is misspelled.

Cheers,
Jan


Am 11.10.2014 12:00, schrieb Daniel Naber:
 Hi,
 
 to provide LT as a 100% pure Java software, I'd like to switch from 
 Hunspell (native code) to Morfologik (Java-based).

--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel