This is excellent news! It pretty much answers a question I was going to
ask on this list about automating part of the rule-creation process.

With the recently updated
http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/de/words-similar.txt
and the Wikipedia data that we use for rule testing and such I hope it
becomes doable for German.

Marcin Miłkowski schrieb:
> Hi all,
> 
> Actually, word confusion is one area where a lot of experiments were 
> made. I also made an experiment with Brill tagger and it worked really 
> fine with English. It should be easy with Dutch as well:
> 
> http://marcinmilkowski.pl/downloads/automating_rules_full.pdf
> 
> Unfortunately, the process does not produce full-blown LT rules, just 
> Brill tagger rules, but all you need is a big clean corpus without any 
> mistake and a list of confusions. The rest is pretty much automatic, and 
> the quality is pretty high.
> 
> It was on our Google Summer of Code list exactly for this reason - 
> making this process automatic seems very easy, as there is a Java 
> version of a Brill tagger that could be used, and we could fairly easily 
> convert the rules to our formalism.
> 
> Another option would be to use the statistical modeling the way it is 
> used in After the Deadline. I'm not sure how good it is in such things, 
> as it never really impressed me with high number of raised alarms.
> 
> Regards
> Marcin
> 
> W dniu 2012-05-16 20:58, Juan Martorell pisze:
>> The problem you set out is entirely semantic and common to all
>> languages. There is no way to distinguish both verbs but semantically.
>> That introduces a new category for comparison, perhaps category trees
>> and IMHO that would overwhelm the scope of the project.
>>
>> A possible shortcut is introducing semantic categories as mock POS and
>> checking their compatibility within the rules as if it were a common
>> agreement. I discourage this because it denormalizes the tagger dictionary.
>>
>> I therefore recommend the brute-force approach, provided that the chance
>> of committing such mistakes justifies the investment.
>>
>> However I'd rather focus on lightening the software than on swelling it
>> with new features. The more light, fast and easy to use, the more
>> successful.
>>
>> Best regards,
>> Juan
>>
>> 2012/5/16 R.J. Baars <r.j.ba...@xs4all.nl <mailto:r.j.ba...@xs4all.nl>>
>>
>>     There is quite a bit of word confusing going on in Dutch. An example:
>>
>>     geplant (planted) versus gepland (planned).
>>
>>     This is not a grammatical issue, but actually using the wrong word,
>>     thereby altering the intention of the sentence.
>>
>>     Neitehr is wrong. Both are very common. Nevertheless, a warning is of
>>     added value. What I need is suppression of lots of unnecessary warnings.
>>
>>     I could add exceptrions for the warning on 'geplant' for every sentence
>>     that contains either plant, tree, shrub, etc.
>>     And exceptions on the warning on 'gepland' for every sentence
>>     containing:
>>     project, activity, planning etc.
>>
>>     But would it be possible to create a 'context' from the sentence and
>>     checking if the word is likely in the context?
>>
>>      >From teh large corpus we built, it would be possible to determine the
>>     'likely context words' for any confusing word.
>>
>>     Has anyone ever thought about a way to implement this kind of check
>>     to LT?
>>
>>     Any thoughts to do it within existing functionality? Is Dutch the only
>>     language having a confusion issue like this?
>>
>>     Ruud
>>
>>
>>     
>> ------------------------------------------------------------------------------
>>     Live Security Virtual Conference
>>     Exclusive live event will cover all the ways today's security and
>>     threat landscape has changed and how IT managers can respond.
>>     Discussions
>>     will include endpoint security, mobile security and the latest in
>>     malware
>>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>     _______________________________________________
>>     Languagetool-devel mailing list
>>     Languagetool-devel@lists.sourceforge.net
>>     <mailto:Languagetool-devel@lists.sourceforge.net>
>>     https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>
>>
>>
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> 
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> 
> 

-- 
Jan Schreiber, M.A.
Universität Duisburg-Essen
Fachbereich Geisteswissenschaften
Fach Philosophie
D-45117 Essen

mailto:jan.schrei...@uni-due.de
http://www.uni-due.de/~gph120/
OpenPGP-Schlüssel:
http://www.uni-due.de/~gph120/diverses/0x06C970E5%20pub.asc
Fingerprint: 57DD DD97 99C6 09E7 8B41 7DAD 63A3 1F42 06C9 70E5

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to