> Hello, Stevan.
> 
Hallo Carlo,

> I would say that invalid messages such as those you spoke should be put 
> into quarantine and tagged as 'Invalid' or 'Corrupt',
>
That could be a problem. I mean in that term that the operator of DSPAM could 
have decided to NOT use quarantine functionality and if we enforce the delivery 
to quarantine then we break that rule. And we would blow up the quarantine for 
nothing if someone is doing a large scale training where such messages could be 
in the training set and a forced delivery to quarantine is (probably) not what 
the trainer is looking for.


> so the user can 
> decide to receive them later. In fact, this is how dspam handles 
> viruses, right?
>
I am not sure. It has been some time since I used Anti-Virus inside DSPAM. Does 
a infected message really get delivered into quarantine? I had the impression 
that Virus infected messages get tagged and then if the user has enabled 
quarantine THEN it gets delivered into quarantine but not FORCED to be 
delivered into quarantine.


> No tokenizing, just put in quarantine and tagged.
>
Yes. No tokenization is done for Virus infected mails. But I am not sure about 
the forced delivery to quarantine.


> What 
> would be the advantage of tokenizing such corrupt messages?
> 
I don't see a big benefit (if at all) in tokenizing such a corrupt message. BUT 
I don't like the error one get's when such a message is processed. I would like 
a more cleaner handling then the error. Assume one is having a MTA that (for 
what ever reason) is accepting such a corrupt message. And assume the MTA is 
processing that message with DSPAM over a pipe. The a error 22 is going to 
instruct the MTA to produce a NDR or such and this is something I would like to 
avoid (if possible).

btw: I was not only writing about tokenizing a message. I am thinking about 
classification as well. Sure a classification needs to tokenize the message in 
order to be able to compute a result but classification does not mean that the 
tokens need to be saved. Just evaluated but not neccesairly saved.

btw2: I am already happy that we where able to reduce the amount of failures 
from those 3.2% down to 0.17% on the TREC05 corpus. I have not tested 3.8.0 to 
see how it behaves on those messages but I would say that 3.8.0 can not be much 
better then 3.9.0. I am 100% sure that 3.9.0 does a better job in parsing the 
message then 3.8.0 but that is another issue. I am more interessed to see if 
3.8.0 is having a lower failure rate then 3.9.0 with the same options. I will 
probably go ahead and install 3.8.0 on a test system and compare them to ensure 
that we are not worse with 3.9.0 then with 3.8.0.



> Best Regards,
> Carlo Rodrigues
> 
-- 
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to