Re: [lingu-dev] Component for guessing the language of a text

Thomas Lange Tue, 06 Jun 2006 01:44:49 -0700

Hi Jocelyn and all,

Jocelyn Merand wrote:


> I'm proud to present you the project I will work on this summer (at
> least). It's seems that the OOo community wants to have a new way to
> guess the language of texts (not only words and sentences but also
> longer texts).

...

The obvious task at hand would be to use this component for the
context-menu of misspelled text the current main use would be for a
single word only. (Ok we can improve somewhat here by trying to get
all the surrounding text of the same language...)

Since a single word is quite a limited sample text I think we may run
in cases where the statistical approach that Jocelyn already pointed out
(and that is definitely the method to use for longer text parts) will
not always work that well.

Since the text is available as unicode string of the character codes
can be used for some languages to improve the result.
There are probably other approaches for this.
Thus I'd like to ask you all:
Do you have any idea or algorithm at hand that can be used to improve
the language guessing for a single word?

Or do you have any other advice?


Best Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] Component for guessing the language of a text

Reply via email to