Hi Daniel and Simon,

Daniel Naber wrote:
On Freitag 26 Mai 2006 18:57, Simon Brouwer wrote:

 It would be great if
the relevant data for the grammar checker would be easy to generate from
a list of such word groups.

It's easy to write such rules in LanguageTool. For example, this "of cause" rule will complain about "of cause" unless it's followed by "and" or "to":

[snip]

If you meant to find those potentially incorrect phrases automatically, that 
doesn't
seem to be trivial.


Well, it could be handled automatically. What you need is a sentence corpus and corpus collocation finder. Based on frequency in a large corpus, you can find the most recurrent phrases in a language. Then you check if the parts of the collocation appear significantly often without other parts of the same collocation. "Hard-linked" collocations should be visible this way.

Here's the link to the web crawler/collocation finder, GPL:

http://www.mimuw.edu.pl/polszczyzna/kolokacje/index-en.htm

I think you could find such phrases with it. Not very trivial, requires some understanding of the software and corpora tools, but... anyway, it's not so hard ;)

Regards,
Marcin
PS. I never tried it but I will, I promise ;)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to