On 9/3/2011 10:00 AM, Stevan Bajic wrote:
Does it mater if the corpus is actively maintained or not? For testing
email classification the only thing that count is the quality of the
corpus.
If the corpus is used to compare different filters, the ratings will
mean more if its representative of today's junk mail...
Something like compressionratings.com -- but for email
classification. Something I could test my configuration/build against
to see where it ranks.
You can use dspam_train for that. It will split out the score after it
has finished with training. Those numbers can be used to compare
against others that have used the same corpus for training.
What I was hoping to find is a script that runs dspam_train, logs the
result and resets the backend datastore. And this script would come with
a collection of dspam.conf files with different combinations of features
and training modes. Loops that do signature training and group
inoculation would be nice too.
String functions or memory allocator errors? In what sense? Do you
think that your build would trigger such an error while it will not
trigger on another setup?
Every flavor has its quirks. Automated tests are a good way to find
them. For example the RHEL 6 malloc function will mmap large chunks of
address space for a threaded applications. This will cause problems for
someone that relies on resource limits to keep a single app from bogging
down a server. See https://wwws.clamav.net/bugzilla/show_bug.cgi?id=1990
As I have written in the past: It is hard to do the same with DSPAM
because of the storage backend that DSPAM uses. You can not just make
check during configure. It is not that easy.
Not everyone is compiling that storage driver. Some one might choose
to just use MySQL and nothing else.
Creating code to test every possible setup is a big undertaking -- but
adding checks which test at least one storage driver shouldn't take
long? Since the file system driver is universal, it might be a good
candidate to start. It would allow people using other backends in
production to at least test the library code. You could even setup the
check target to build test binaries using the file system driver. Then
make check will work even if the configure options specify a database
backend.
None of those tools have artificial intelligence like DSPAM has.
ClamAV is just checking hashes. If DSPAM would work the same way then
making a check suite would be easy, but it is not. We could however go
on and code a suite that checks if the test "I am a test" is producing
the proper result when using the various tokenizers. But that's all.
'bajic'? LOL. I am just a coding monkey. That's all. I for sure don't
want to taint users database backends with my family name as password.
If you are so ultra giga horny for tests then I could build a database
for MySQL and one for PostgreSQL that has tokens and dump that data
and upload it to Sourceforge. Users could then use that data to do
some limited testing.
If it came with the script test script I mentioned above, I'd be happy!
Having to test DSPAM releases manually means I only upgrade the library
during development cycles. That's my only chance to manually test
library releases for problems. That means long delays between updates,
and I still miss problems because I'm only testing the code I'm developing.
The more testing I can script, the more comfortable I will be pushing a
new release out to production outside of my person dev schedule. Because
DSPAM doesn't currently ship with _any_ automated tests, I don't trust a
new release in production until I've spent some quality time testing it.
That's why I still use DSPAM v3.6.8 in production! I upgraded to
v3.9.0-RC2, then v3.10.0 and now v3.10.1 on my development machine and
just wanted to add more automated checks to my build scripts so I can
stay in sync going forward.
------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops? How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel