On Monday 13 Jun 2005 14:52, Markus Wiederkehr wrote:
> On 6/13/05, Andy Roberts <[EMAIL PROTECTED]> wrote:
> > On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote:
> > > I see, the list of exceptions makes this a lot more complicated than I
> > > thought... Thanks a lot, Erik!
> >
> > I expect you'll need to do some pre-processing. Read in your text into a
> > buffer, line-by-line. If a given line ends with a hyphen, you can
> > manipulate the buffer to merge the hyphenated tokens.
>
> As Erik wrote it is not that simple, unfortunately. For example, if
> one line ends with "read-" and the next line begins with "only" the
> correct word is "read-only" not "readonly". Whereas "work-" and "ing"
> should of course be merged into "working".
>
> Markus

Perhaps you do some crude checking against a dictionary. Combine the word 
anyway and check if it's in the dictionary. If so, keep it merged otherwise, 
it's a compound and so revert back to the hyphenated form.

Word lists come part of all good OSS dictionary projects, as well as other 
language resources, like the BNC word lists etc.

Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to