On 9/3/2011 10:00 AM, Stevan Bajic wrote:

Does it mater if the corpus is actively maintained or not? For testing email classification the only thing that count is the quality of the corpus.

If the corpus is used to compare different filters, the ratings will mean more if its representative of today's junk mail...

Something like compressionratings.com -- but for email classification. Something I could test my configuration/build against to see where it ranks.

You can use dspam_train for that. It will split out the score after it has finished with training. Those numbers can be used to compare against others that have used the same corpus for training.

What I was hoping to find is a script that runs dspam_train, logs the result and resets the backend datastore. And this script would come with a collection of dspam.conf files with different combinations of features and training modes. Loops that do signature training and group inoculation would be nice too.

String functions or memory allocator errors? In what sense? Do you think that your build would trigger such an error while it will not trigger on another setup?


Every flavor has its quirks. Automated tests are a good way to find them. For example the RHEL 6 malloc function will mmap large chunks of address space for a threaded applications. This will cause problems for someone that relies on resource limits to keep a single app from bogging down a server. See https://wwws.clamav.net/bugzilla/show_bug.cgi?id=1990

As I have written in the past: It is hard to do the same with DSPAM because of the storage backend that DSPAM uses. You can not just make check during configure. It is not that easy.

Not everyone is compiling that storage driver. Some one might choose to just use MySQL and nothing else.

Creating code to test every possible setup is a big undertaking -- but adding checks which test at least one storage driver shouldn't take long? Since the file system driver is universal, it might be a good candidate to start. It would allow people using other backends in production to at least test the library code. You could even setup the check target to build test binaries using the file system driver. Then make check will work even if the configure options specify a database backend.

None of those tools have artificial intelligence like DSPAM has. ClamAV is just checking hashes. If DSPAM would work the same way then making a check suite would be easy, but it is not. We could however go on and code a suite that checks if the test "I am a test" is producing the proper result when using the various tokenizers. But that's all.

'bajic'? LOL. I am just a coding monkey. That's all. I for sure don't want to taint users database backends with my family name as password. If you are so ultra giga horny for tests then I could build a database for MySQL and one for PostgreSQL that has tokens and dump that data and upload it to Sourceforge. Users could then use that data to do some limited testing.


If it came with the script test script I mentioned above, I'd be happy!

Having to test DSPAM releases manually means I only upgrade the library during development cycles. That's my only chance to manually test library releases for problems. That means long delays between updates, and I still miss problems because I'm only testing the code I'm developing.

The more testing I can script, the more comfortable I will be pushing a new release out to production outside of my person dev schedule. Because DSPAM doesn't currently ship with _any_ automated tests, I don't trust a new release in production until I've spent some quality time testing it. That's why I still use DSPAM v3.6.8 in production! I upgraded to v3.9.0-RC2, then v3.10.0 and now v3.10.1 on my development machine and just wanted to add more automated checks to my build scripts so I can stay in sync going forward.


------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to