Thanks, Marcin.

Some remarks. The improvements I sent to the list 15 days ago have not
been added, and moreover I have found more bugs.

I attach the code I'm using now and explain briefly the reasons for the changes.

- In the getAllReplacements method we need to make sure that the
replacements are done from left to right. We must complete the
for-loop of the replacement pairs, choose the first possible
replacement (form left to right) and then start the two new branches
(with and without replacement). Otherwise, some replacements are not
done.

- If there is "ss" as a key in the replacement pairs, and somebody
uses a long string of s ("ssssssssss...") as input text, this can
cause the method to consume all the memory, as the algorithm is
exponential (2^(number of replacements)). This happened to us in an
online server, and the LT server crashed. The depth of the recursive
algorithm should be limited to 4 o 5 levels at most.

- It is possible that different "words to check" give the same
suggestion. So at some point we need to remove duplicates. I do this
at the end of findReplacements().

- The conditions around line 238 (current github version 1.7) are not
correct. The first isInDictionary makes the lower case conversion
useless:

                    if (isInDictionary(wordChecked)
                            && dictionaryMetadata.isConvertingCase()
                            && isMixedCase(wordChecked)
                            &&
isInDictionary(wordChecked.toLowerCase(dictionaryMetadata.getLocale())))

I think they should be something like:

          if (isInDictionary(wordChecked)
              || (dictionaryMetadata.convertCase
              && isMixedCase(wordChecked)
              && isInDictionary(wordChecked
                  .toLowerCase(dictionaryMetadata.dictionaryLocale))))


Regards,
Jaume Ortolà
Salutacions,
Jaume Ortolà
www.riuraueditors.cat



2013/7/15 Marcin Miłkowski <list-addr...@wp.pl>:
> W dniu 2013-07-15 10:56, Marcin Miłkowski pisze:
>> Hi,
>>
>> Dawid just released morfologik 1.7 on Maven. So we can actually go on
>> and include a newer version in LT.
>>
>> The new version still does not support compounding but it has all the
>> features required for getting better diacritic suggestions.
>
> Here's the documentation:
>
> http://wiki.languagetool.org/hunspell-support#toc5
>
> Best,
> Marcin
>
>
>> Best,
>> Marcin
>>
>> W dniu 2013-07-02 08:59, Marcin Miłkowski pisze:
>>> W dniu 2013-07-02 01:11, Jaume Ortolà i Font pisze:
>>>> Hi Marcin,
>>>>
>>>> I have been using the still unreleased code of morfologik-stemming and I
>>>> have made improvements to Speller.java for some previously unforseen
>>>> cases. See the attachement.
>>>>
>>>> In order to complete the development, and test & debug with all
>>>> languages, perhaps we could include temporarily the morfologik module
>>>> inside LanguageTool. This will make thinks easier. What do yo think?
>>>
>>> No. I should make a release, forking morfologik makes no sense to me.
>>>
>>> The only thing that stops me is the lack of time to work on compounds.
>>>
>>> Best,
>>> Marcin
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Languagetool-devel mailing list
>>> Languagetool-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>
>>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Attachment: Speller.java
Description: Binary data

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to