On 4/16/2015 10:43 AM, Sarang Shrivastava wrote:
Yes, indeed CRM114 has a lot criterias for categorization of data and
that can be done via a host of methods, including regexes, approximate
regexes, a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW,
Correllation, KNN/Hyperspace, or Bit Entropy.
We can take ideas from them and develop our own plugin that has the
capability to compete with CRM114. Afterall there is no place like
home. I look forward to work on these given the fact that my proposal
gets accepted.
I look forward to that as well. Much of these algorithms are outside of
my area of expertise but testing and tweaking them in real-world
environments to gauge efficacy is something that I can do very well.
A thought to give: Does using custom plugins hinder the performance of
SA in terms of speed ? No doubt that CRM114 is good in classifying
spams and hams but does it in any case hamper the speed at all ?
What do you guys say about including these into SA itself if possible ?
The plugin engine in SA is refined and stable. A plugin that is not
enabled has no effect on performance that I can think of in any way,
shape or form.
Beyond that, if a plugin is enabled, the performance of the plugin is
important. Something like what you are discussing is of particular
interest to the higher volume users who will be the most sensitive to
performance. But if it's an effective tool to classify emails, people
might be more accommodating.
But performance and scalability is one of the reasons I've pointed you
towards Redis as a backend. Now, the algorithms data store might need
more than a hash store can provide. Txrep, for example, relies on SQL
and I can't think of a way to make it more Redis compatible.
Regards,
KAM