Hi there, We are rewriting AI::Categorizer from scratch - maybe not all of it - only the parts that we need. It was just too difficult to customize it and since this is the third time I encounter this problem I decided to try a rewrite this time (and my employer agreed! :).
I am trying to split it into functional parts that would have functional interfaces wherever that is possible and generally be much simpler. One of the separate parts would be a something that would parse a text into words and count them. This sounds like a very generic job - but actually the requirements for the input for a classifier are quite specific - for example we don't need to have a 100% language correctness, we process http addresses in a special way and filter other html out, the list of stop words is specific etc. Maybe it would be useful also for search engines or other stuff - I don't know - but for now I only want something that would use heuristics useful in our specific case. How would you name that part? AI::WordSplitter (because it uses heuristics) Text::WordCounter? -- Zbigniew Lukasiak http://brudnopis.blogspot.com/ http://perlalchemy.blogspot.com/