Hi, Andreas,

<< I don't quite understand what you mean by "I’m having a hard time
thinking through the process of generating the
candidate suffix set using set forms" >>

It is my usual roundabout way of saying "I don't know how to do this." ;)

I'm looking at your code as we speak.

Thanks,
Tuba

On Mon, Aug 8, 2011 at 1:13 AM, Andreas Kostler <
andreas.koest...@leica-geosystems.com> wrote:

> Hi Tuba,
> I don't quite understand what you mean by "I’m having a hard time
> thinking through the process of generating the
> candidate suffix set using set forms" but I have created a porter
> stemmer for English in the past.
> I understand that's not what you're looking for but it is moreso a
> framwork for building stemmers:
>
> You specify rules of the like:
> {:c? condition :s1 "abc" :s2 "efg" :a action}
> reading if condition is met, replace s1 with s2 and execute action.
> Where s1 could be a suffix etc. All you need to do is specify these rules.
> Have a browse
> https://github.com/AndreasKostler/Stout
>
> Cheers
> Andreas
>
>
> On 8 August 2011 16:16, Tuba Lambanog <tuba.lamba...@gmail.com> wrote:
> >
> > Hello,
> >
> > I’m doing a word stemmer for a non-English language. A stemmer parses
> > a word into its word parts: prefixes, roots, suffixes. The input word
> > is at least a root word (English example would be ‘cloud’), but can be
> > any combination of  prefix(es) and a root (e.g., 'pre-nuptial'), or a
> > root and suffix(es) (‘cloudy’), or all three ('unidirection'). A
> > sequence of more than one prefix in a word is considered one
> > occurrence of a prefix, and similarly for complex prefixes, thus,
> > ‘directional’ is considered to have the ‘single’ suffix ‘ional’. The
> > prefixes, roots, and suffixes are in their own set data structure.
> >
> > The approach I am pursuing is to create a set of potential suffixes
> > that the input word contains. Asssume, for simplicity, that the suffix
> > set consists of #{-or, -er, -al, -ion, -ional, able}. The input
> > ‘directional’ would have the candidate suffix set #{-al –ional}. Now,
> > drop the longest suffix (‘ional’) from the input then check the
> > remaining string (‘direct’) if it is a root; if it is, done. If not,
> > try the next suffix (‘-al’) in the potential suffix set.  Prefixes
> > will be similarly processed. Input words with both prefixes and
> > affixes will be fun to do ;)
> >
> > I’m having a hard time thinking through the process of generating the
> > candidate suffix set using set forms, and I’m beginning to think I
> > have selected an arduous path (for me).
> >
> > Thoughts?
> >
> > Thanks.
> > Tuba
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to clojure@googlegroups.com
> > Note that posts from new members are moderated - please be patient with
> your first post.
> > To unsubscribe from this group, send email to
> > clojure+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/clojure?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to