While testing my code I stumbled upon a couple bugs. My test corpus includes a number of messages that break formatting rules. The rogue test messages allowed me to find a potential memory leak and a strcmp() call that reads uninitialized bytes. Attached is a valgrind log showing the bugs and a patch to fix the problems.

By pumping emails through the library I was able to look for bugs that might trigger a crash but I still don't have a way to test whether DSPAM is classifying emails correctly. Can anyone point me to a standardized test corpus and the scripts/tools needed to test that corpus using different DSPAM configs? The doc/tests.txt file shows results from 2009 but doesn't say how to reproduce the experiment. If my own test results matched what the core DSPAM developers got I'd be a happy code monkey... And if I could trigger a test run via `make check` I'd be the happiest code monkey in the bazaar.
==4891== Thread 5:
==4891== Invalid read of size 1
==4891==    at 0x56C7A85: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:169)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891==  Address 0x8788fb5 is 2 bytes after a block of size 3 alloc'd
==4891==    at 0x4A05FDE: malloc 
(/home/ladar/Playground/valgrind-3.6.1/coregrind/m_replacemalloc/vg_replace_malloc.c:236)
==4891==    by 0x56C6B2A: nt_add 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/nodetree.c:109)
==4891==    by 0x56C7966: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:133)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891== 
==4891== Invalid read of size 1
==4891==    at 0x33CE047A67: vfprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/stdio-common/vfprintf.c:1593)
==4891==    by 0x33CE06EA41: vsnprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/libio/vsnprintf.c:120)
==4891==    by 0x33CE04EA52: snprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/stdio-common/snprintf.c:35)
==4891==    by 0x56C7ABE: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:171)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891==  Address 0x8788fb5 is 2 bytes after a block of size 3 alloc'd
==4891==    at 0x4A05FDE: malloc 
(/home/ladar/Playground/valgrind-3.6.1/coregrind/m_replacemalloc/vg_replace_malloc.c:236)
==4891==    by 0x56C6B2A: nt_add 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/nodetree.c:109)
==4891==    by 0x56C7966: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:133)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891== 
==4891== Invalid read of size 1
==4891==    at 0x33CE073980: _IO_default_xsputn 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/libio/genops.c:480)
==4891==    by 0x33CE047595: vfprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/stdio-common/vfprintf.c:1593)
==4891==    by 0x33CE06EA41: vsnprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/libio/vsnprintf.c:120)
==4891==    by 0x33CE04EA52: snprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/stdio-common/snprintf.c:35)
==4891==    by 0x56C7ABE: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:171)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891==  Address 0x8788fb5 is 2 bytes after a block of size 3 alloc'd
==4891==    at 0x4A05FDE: malloc 
(/home/ladar/Playground/valgrind-3.6.1/coregrind/m_replacemalloc/vg_replace_malloc.c:236)
==4891==    by 0x56C6B2A: nt_add 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/nodetree.c:109)
==4891==    by 0x56C7966: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:133)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891== 
==4891== Invalid read of size 1
==4891==    at 0x33CE073990: _IO_default_xsputn 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/libio/genops.c:479)
==4891==    by 0x33CE047595: vfprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/stdio-common/vfprintf.c:1593)
==4891==    by 0x33CE06EA41: vsnprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/libio/vsnprintf.c:120)
==4891==    by 0x33CE04EA52: snprintf 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/stdio-common/snprintf.c:35)
==4891==    by 0x56C7ABE: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:171)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891==  Address 0x8788fb7 is 4 bytes after a block of size 3 alloc'd
==4891==    at 0x4A05FDE: malloc 
(/home/ladar/Playground/valgrind-3.6.1/coregrind/m_replacemalloc/vg_replace_malloc.c:236)
==4891==    by 0x56C6B2A: nt_add 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/nodetree.c:109)
==4891==    by 0x56C7966: _ds_tokenize_ngram 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:133)
==4891==    by 0x56C781C: _ds_tokenize 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/tokenizer.c:92)
==4891==    by 0x56BAB9C: _ds_operate 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:876)
==4891==    by 0x56BA2EE: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:573)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891== 
==4891== Thread 1:
==4891== 56 bytes in 1 blocks are definitely lost in loss record 4 of 14
==4891==    at 0x4A05FDE: malloc 
(/home/ladar/Playground/valgrind-3.6.1/coregrind/m_replacemalloc/vg_replace_malloc.c:236)
==4891==    by 0x33CE07F761: strdup 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/string/strdup.c:43)
==4891==    by 0x56B56A8: _ds_decode_headers 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/decode.c:439)
==4891==    by 0x56B52CA: _ds_actualize_message 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/decode.c:270)
==4891==    by 0x56BA1AA: dspam_process 
(/home/ladar/Lavabit/magma.so/sources/dspam/src/libdspam.c:539)
==4891==    by 0x47B347: dspam_check 
(/home/ladar/Lavabit/magma/.debug/../providers/checkers/dspam.c:103)
==4891==    by 0x436BDF: smtp_accept_message 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/accept.c:345)
==4891==    by 0x440288: smtp_data_inbound 
(/home/ladar/Lavabit/magma/.debug/../servers/smtp/smtp.c:972)
==4891==    by 0x4A5A56: dequeue 
(/home/ladar/Lavabit/magma/.debug/../engine/controller/queue.c:89)
==4891==    by 0x33CEC077E0: start_thread 
(/usr/src/debug/glibc-2.12-2-gc4ccff1/nptl/pthread_create.c:301)
==4891==    by 0x33CE0E577C: clone 
(/usr/src/debug////////glibc-2.12-2-gc4ccff1/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:115)
==4891== 
diff -r 22546f626e64 sources/dspam/src/decode.c
--- a/sources/dspam/src/decode.c        Fri Aug 19 04:28:01 2011 -0500
+++ b/sources/dspam/src/decode.c        Sat Aug 27 03:26:04 2011 -0500
@@ -445,6 +445,8 @@ _ds_decode_headers (ds_message_part_t bl
         ptr = strtok_r (NULL, "?", &ptrptr);
         dptr = strtok_r (NULL, "?", &ptrptr);
         if (!dptr) {
+          if (was_null && header->original_data != NULL)
+               free(header->original_data);
           if (was_null)
             header->original_data = NULL;
           continue;
diff -r 22546f626e64 sources/dspam/src/nodetree.c
--- a/sources/dspam/src/nodetree.c      Fri Aug 19 04:28:01 2011 -0500
+++ b/sources/dspam/src/nodetree.c      Sat Aug 27 03:26:04 2011 -0500
@@ -106,7 +106,8 @@ nt_add (struct nt *nt, void *data)
   if (nt->nodetype == NT_CHAR)
   {
     long size = strlen ((char *) data) + 1;
-    vptr = malloc (size);
+    /* vptr is compared with  'From' and 'X-DSPAM-'even if data is shorter; 
but a larger buffer is allocated to prevent a comparison against garbage */
+    vptr = malloc (size < 16 ? 16 : size);
     if (vptr == NULL)
     {
       LOG (LOG_CRIT, ERR_MEM_ALLOC);
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to