On Sep 20, 2006, at 12:07 AM, Daniel Naber wrote:

Writing a decomposer is difficult as you need both a large dictionary
*without* compounds and a set of rules to avoid splitting at too many
positions.

Conceptually, how different is the problem of decompounding German from tokenizing languages such as Thai and Japanese, where "words" are not separated by spaces and may consist of multiple characters?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to