W dniu 2013-07-15 12:41, Jaume Ortolà i Font pisze:
> Thanks, Marcin.
>
> Some remarks. The improvements I sent to the list 15 days ago have not
> been added, and moreover I have found more bugs.

I'm really sorry but there are 200 mails from the mailing list over the 
last two weeks and I have been away from my e-mail. Could you please add 
your changes as issues on github for morfologik-stemming? This way it 
would make it much easier for us to track these things.

>
> I attach the code I'm using now and explain briefly the reasons for the 
> changes.
>
> - In the getAllReplacements method we need to make sure that the
> replacements are done from left to right. We must complete the
> for-loop of the replacement pairs, choose the first possible
> replacement (form left to right) and then start the two new branches
> (with and without replacement). Otherwise, some replacements are not
> done.

OK, this sounds OK. I integrated your changes.

> - If there is "ss" as a key in the replacement pairs, and somebody
> uses a long string of s ("ssssssssss...") as input text, this can
> cause the method to consume all the memory, as the algorithm is
> exponential (2^(number of replacements)). This happened to us in an
> online server, and the LT server crashed. The depth of the recursive
> algorithm should be limited to 4 o 5 levels at most.

Is that in getAllReplacements()?

> - It is possible that different "words to check" give the same
> suggestion. So at some point we need to remove duplicates. I do this
> at the end of findReplacements().

You are right. We could probably write the same code in a slightly more 
elegant way, without converting this to a LinkedHashSet but simply by 
adding to a set when iterating the list.

>
> - The conditions around line 238 (current github version 1.7) are not
> correct. The first isInDictionary makes the lower case conversion
> useless:
>
>                      if (isInDictionary(wordChecked)
>                              && dictionaryMetadata.isConvertingCase()
>                              && isMixedCase(wordChecked)
>                              &&
> isInDictionary(wordChecked.toLowerCase(dictionaryMetadata.getLocale())))
>
> I think they should be something like:
>
>            if (isInDictionary(wordChecked)
>                || (dictionaryMetadata.convertCase
>                && isMixedCase(wordChecked)
>                && isInDictionary(wordChecked
>                    .toLowerCase(dictionaryMetadata.dictionaryLocale))))

Fixed!

I tried to add your fixes but your code is now quite far away from ours, 
so diff does not give any meaningful output. Please review the code on 
github, and if needed, file an issue over changes that need to be done.

Regards,
Marcin

>
> Regards,
> Jaume Ortolà
> Salutacions,
> Jaume Ortolà
> www.riuraueditors.cat
>
>
>
> 2013/7/15 Marcin Miłkowski <list-addr...@wp.pl>:
>> W dniu 2013-07-15 10:56, Marcin Miłkowski pisze:
>>> Hi,
>>>
>>> Dawid just released morfologik 1.7 on Maven. So we can actually go on
>>> and include a newer version in LT.
>>>
>>> The new version still does not support compounding but it has all the
>>> features required for getting better diacritic suggestions.
>>
>> Here's the documentation:
>>
>> http://wiki.languagetool.org/hunspell-support#toc5
>>
>> Best,
>> Marcin
>>
>>
>>> Best,
>>> Marcin
>>>
>>> W dniu 2013-07-02 08:59, Marcin Miłkowski pisze:
>>>> W dniu 2013-07-02 01:11, Jaume Ortolà i Font pisze:
>>>>> Hi Marcin,
>>>>>
>>>>> I have been using the still unreleased code of morfologik-stemming and I
>>>>> have made improvements to Speller.java for some previously unforseen
>>>>> cases. See the attachement.
>>>>>
>>>>> In order to complete the development, and test & debug with all
>>>>> languages, perhaps we could include temporarily the morfologik module
>>>>> inside LanguageTool. This will make thinks easier. What do yo think?
>>>>
>>>> No. I should make a release, forking morfologik makes no sense to me.
>>>>
>>>> The only thing that stops me is the lack of time to work on compounds.
>>>>
>>>> Best,
>>>> Marcin
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> This SF.net email is sponsored by Windows:
>>>>
>>>> Build for Windows Store.
>>>>
>>>> http://p.sf.net/sfu/windows-dev2dev
>>>> _______________________________________________
>>>> Languagetool-devel mailing list
>>>> Languagetool-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>>
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to