Hi Daniel and Simon,
Daniel Naber wrote:
On Freitag 26 Mai 2006 18:57, Simon Brouwer wrote:
It would be great if
the relevant data for the grammar checker would be easy to generate from
a list of such word groups.
It's easy to write such rules in LanguageTool. For example, this "of cause" rule will complain about
"of cause" unless it's followed by "and" or "to":
[snip]
If you meant to find those potentially incorrect phrases automatically, that
doesn't
seem to be trivial.
Well, it could be handled automatically. What you need is a sentence
corpus and corpus collocation finder. Based on frequency in a large
corpus, you can find the most recurrent phrases in a language. Then you
check if the parts of the collocation appear significantly often without
other parts of the same collocation. "Hard-linked" collocations should
be visible this way.
Here's the link to the web crawler/collocation finder, GPL:
http://www.mimuw.edu.pl/polszczyzna/kolokacje/index-en.htm
I think you could find such phrases with it. Not very trivial, requires
some understanding of the software and corpora tools, but... anyway,
it's not so hard ;)
Regards,
Marcin
PS. I never tried it but I will, I promise ;)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]