> Hello, Stevan. > Hallo Carlo,
> I would say that invalid messages such as those you spoke should be put > into quarantine and tagged as 'Invalid' or 'Corrupt', > That could be a problem. I mean in that term that the operator of DSPAM could have decided to NOT use quarantine functionality and if we enforce the delivery to quarantine then we break that rule. And we would blow up the quarantine for nothing if someone is doing a large scale training where such messages could be in the training set and a forced delivery to quarantine is (probably) not what the trainer is looking for. > so the user can > decide to receive them later. In fact, this is how dspam handles > viruses, right? > I am not sure. It has been some time since I used Anti-Virus inside DSPAM. Does a infected message really get delivered into quarantine? I had the impression that Virus infected messages get tagged and then if the user has enabled quarantine THEN it gets delivered into quarantine but not FORCED to be delivered into quarantine. > No tokenizing, just put in quarantine and tagged. > Yes. No tokenization is done for Virus infected mails. But I am not sure about the forced delivery to quarantine. > What > would be the advantage of tokenizing such corrupt messages? > I don't see a big benefit (if at all) in tokenizing such a corrupt message. BUT I don't like the error one get's when such a message is processed. I would like a more cleaner handling then the error. Assume one is having a MTA that (for what ever reason) is accepting such a corrupt message. And assume the MTA is processing that message with DSPAM over a pipe. The a error 22 is going to instruct the MTA to produce a NDR or such and this is something I would like to avoid (if possible). btw: I was not only writing about tokenizing a message. I am thinking about classification as well. Sure a classification needs to tokenize the message in order to be able to compute a result but classification does not mean that the tokens need to be saved. Just evaluated but not neccesairly saved. btw2: I am already happy that we where able to reduce the amount of failures from those 3.2% down to 0.17% on the TREC05 corpus. I have not tested 3.8.0 to see how it behaves on those messages but I would say that 3.8.0 can not be much better then 3.9.0. I am 100% sure that 3.9.0 does a better job in parsing the message then 3.8.0 but that is another issue. I am more interessed to see if 3.8.0 is having a lower failure rate then 3.9.0 with the same options. I will probably go ahead and install 3.8.0 on a test system and compare them to ensure that we are not worse with 3.9.0 then with 3.8.0. > Best Regards, > Carlo Rodrigues > -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Dspam-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-devel
