Hi there,

We are rewriting AI::Categorizer from scratch - maybe not all of it -
only the parts that we need.  It was just too difficult to customize
it and since this is the third time I encounter this problem I decided
to try a rewrite this time (and my employer agreed! :).

I am trying to split it into functional parts that would have
functional interfaces wherever that is possible and generally be much
simpler.

One of the separate parts would be a something that would parse a text
into words and count them.  This sounds like a very generic job - but
actually the requirements for the input for a classifier are quite
specific - for example we don't need to have a 100% language
correctness, we process http addresses in a special way and filter
other html out, the list of stop words is specific etc.  Maybe it
would be useful also for search engines or other stuff - I don't know
- but for now I only want something that would use heuristics useful
in our specific case.

How would you name that part?

AI::WordSplitter (because it uses heuristics)
Text::WordCounter?

--
Zbigniew Lukasiak
http://brudnopis.blogspot.com/
http://perlalchemy.blogspot.com/

Reply via email to