|
Well....this
kind of goes along the idea of "bayes chains". You can look into which
pairs/treos of bayes tokens hit the most spam and least ham. Same goes for
rules. There are some scripts around the community to give top hitting rules,
which might come in very useful.
Once you find
these magic pairs/treos, it should be relativley easy to meta them together.
Although I'm not sure how you would do that on the bayes token side, as I think
it kind of already is handled. Its public knowledge that I dislike bayes
and don't use it :)
Its a good
idea, and prbly the next best step to look at.
HTH,
In this same vein of exploring concepts like the application of
boosting algorithms or using meta rulesets to enhance the SA classification
process, I've been looking for an interesting doctoral dissertation topic in
the spam domain for a some time now and was wondering if folks in the SA
community had some ideas rolling around in the back of their minds that would
lend themselves to doctoral-level research? Perhaps some area you'd really
like to explore yourself, "if only you had the time.":-)
My program in
CS is especially geared towards folks with a lot of hands-on, real world IT
experience, and so topics with an applied research & development bent and
a serious coding component are quite OK. Any ideas, interesting leads, or
useful pointers would be much appreciated.
Thanks muchly for your
thoughts.
--ted
Sidney Markowitz wrote:
Fred wrote:
There was similar work being done in the past to identify rules to be
grouped into new meta rules, this (w|c)ould achieve similar results.
http://bugzilla.spamassassin.org/show_bug.cgi?id=1363
I think I'm missing something here. Are you saying that automatically
grouping rules into meta rules that have similar classification properties
is equivalent to boosting? Or do you mean that it is another approach that
also can improve performance of weak learners?
In any case, you have given me an idea for the microarray gene _expression_
problem, so thanks! :-)
-- sidney
--
================================================================
Ted Markowitz
Chief Architect
Cognosys LLC (http://www.cognosys.net)
10 Hamilton Lane, Darien, CT 06820-2809, USA
----------------------------------------------------------------
203-655-2400 (phone/fax) 203-984-6565 (cell)
[EMAIL PROTECTED] (email) TJMarkowitz (AIM ID)
================================================================
NOTICE: This e-mail, including attachments, is intended solely
for the person(s) or organization(s) shown in the message's
header and may contain confidential and/or legally privileged
information. Any unauthorized disclosure, copying, or other
unapproved use or retransmission of this information may be
unlawful and is strictly prohibited. If you are not the
intended recipient, please delete this message immediately.
================================================================
|
- RE: [OT] "Boosting" and other potential researc... Chris Santerre
-