On Mon, Aug 8, 2011 at 1:46 PM, Tuba Lambanog <tuba.lamba...@gmail.com> wrote:
> Hello,
>
> I’m doing a word stemmer for a non-English language. A stemmer parses
> a word into its word parts: prefixes, roots, suffixes. The input word
> is at least a root word (English example would be ‘cloud’), but can be
> any combination of  prefix(es) and a root (e.g., 'pre-nuptial'), or a
> root and suffix(es) (‘cloudy’), or all three ('unidirection'). A
> sequence of more than one prefix in a word is considered one
> occurrence of a prefix, and similarly for complex prefixes, thus,
> ‘directional’ is considered to have the ‘single’ suffix ‘ional’. The
> prefixes, roots, and suffixes are in their own set data structure.
>
> The approach I am pursuing is to create a set of potential suffixes
> that the input word contains. Asssume, for simplicity, that the suffix
> set consists of #{-or, -er, -al, -ion, -ional, able}. The input
> ‘directional’ would have the candidate suffix set #{-al –ional}. Now,
> drop the longest suffix (‘ional’) from the input then check the
> remaining string (‘direct’) if it is a root; if it is, done. If not,
> try the next suffix (‘-al’) in the potential suffix set.  Prefixes
> will be similarly processed. Input words with both prefixes and
> affixes will be fun to do ;)
>
> I’m having a hard time thinking through the process of generating the
> candidate suffix set using set forms, and I’m beginning to think I
> have selected an arduous path (for me).
>
> Thoughts?
>

Somehow offtopic maybe, but have you looked at Snowball
http://snowball.tartarus.org/ ?
Algorithm is different but language that is used to describe stemmers
there is almost lisp and may be useful at least as reference.

-- 
Petr Gladkikh

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to