Re: [Dspam-devel] A memory leak and the reading of uninitialized bytes in 3.10.1... patch attached

Ladar Levison Thu, 08 Sep 2011 06:22:40 -0700

On 9/3/2011 10:00 AM, Stevan Bajic wrote:

Does it mater if the corpus is actively maintained or not? For testingemail classification the only thing that count is the quality of thecorpus.

If the corpus is used to compare different filters, the ratings willmean more if its representative of today's junk mail...

Something like compressionratings.com -- but for emailclassification. Something I could test my configuration/build againstto see where it ranks.
You can use dspam_train for that. It will split out the score after ithas finished with training. Those numbers can be used to compareagainst others that have used the same corpus for training.

What I was hoping to find is a script that runs dspam_train, logs theresult and resets the backend datastore. And this script would come witha collection of dspam.conf files with different combinations of featuresand training modes. Loops that do signature training and groupinoculation would be nice too.

String functions or memory allocator errors? In what sense? Do youthink that your build would trigger such an error while it will nottrigger on another setup?

Every flavor has its quirks. Automated tests are a good way to findthem. For example the RHEL 6 malloc function will mmap large chunks ofaddress space for a threaded applications. This will cause problems forsomeone that relies on resource limits to keep a single app from boggingdown a server. See https://wwws.clamav.net/bugzilla/show_bug.cgi?id=1990

As I have written in the past: It is hard to do the same with DSPAMbecause of the storage backend that DSPAM uses. You can not just makecheck during configure. It is not that easy.
Not everyone is compiling that storage driver. Some one might chooseto just use MySQL and nothing else.

Creating code to test every possible setup is a big undertaking -- butadding checks which test at least one storage driver shouldn't takelong? Since the file system driver is universal, it might be a goodcandidate to start. It would allow people using other backends inproduction to at least test the library code. You could even setup thecheck target to build test binaries using the file system driver. Thenmake check will work even if the configure options specify a databasebackend.

None of those tools have artificial intelligence like DSPAM has.ClamAV is just checking hashes. If DSPAM would work the same way thenmaking a check suite would be easy, but it is not. We could however goon and code a suite that checks if the test "I am a test" is producingthe proper result when using the various tokenizers. But that's all.
'bajic'? LOL. I am just a coding monkey. That's all. I for sure don'twant to taint users database backends with my family name as password.If you are so ultra giga horny for tests then I could build a databasefor MySQL and one for PostgreSQL that has tokens and dump that dataand upload it to Sourceforge. Users could then use that data to dosome limited testing.


If it came with the script test script I mentioned above, I'd be happy!

Having to test DSPAM releases manually means I only upgrade the libraryduring development cycles. That's my only chance to manually testlibrary releases for problems. That means long delays between updates,and I still miss problems because I'm only testing the code I'm developing.

The more testing I can script, the more comfortable I will be pushing anew release out to production outside of my person dev schedule. BecauseDSPAM doesn't currently ship with _any_ automated tests, I don't trust anew release in production until I've spent some quality time testing it.That's why I still use DSPAM v3.6.8 in production! I upgraded tov3.9.0-RC2, then v3.10.0 and now v3.10.1 on my development machine andjust wanted to add more automated checks to my build scripts so I canstay in sync going forward.

------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/

_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Re: [Dspam-devel] A memory leak and the reading of uninitialized bytes in 3.10.1... patch attached

Reply via email to