Thank you for the explanation Thomas.

You are right, I chose to not collect and I am fine with that result. My 
concern was about the mails that were collected earlier and are overwritten in 
the process. Their contribution to the corpus is lost. A possible way around 
this could be an extra temporary storage location for the mail under 
investigation, where all the work is done. (There already is a tmp directory.) 
This would prevent existing collected mails to be overwritten. After the final 
results are there ASSP could decide where to move that mail file, or to delete 
it.

But I humbly admit that I have no idea how much benefit this would be or at 
what price it comes. I better trust in your excellent work. 😊

Best regards
Dirk

---------------------------

Von: Thomas Eckardt <thomas.ecka...@thockar.com> 
Gesendet: Donnerstag, 3. Mai 2018 08:32
An: ASSP development mailing list <assp-test@lists.sourceforge.net>
Betreff: Re: [Assp-test] noCollecting & the ham/spam corpus

Normaly assp will collect all mails. 
There are two states (configuration) to prevent this - noCollecting and 
noCollectRe. You use noCollecting. 

>2018-05-02 09:22:33 [Worker_1] mailto:Router@sender.domain matches
>mailto:router@sender.domain in noCollecting

Both states are checked in a Post-Mail-Procedure. So the rule is - collect all 
mails - if processed, check if any of the both flags were set in the past or 
has to be set (we now have the complete mail) - if the noCollect.... is 
detected remove the file. 

The 'collect all mails in doubt' rule is required for several other features 
and functions. A simple example: a noProcessing sender sends a virus and virus 
check is skipped because of this flag. Why should assp store a virus in the 
corpus? 

my questions: 

1. Yes this may weaken the corpus - but it was your decision to set 
noCollecting - why should assp ignore it? 
2. The final decision to collect or delete or not to collect a mail in a file 
can be only made, if we have the complete mail/file (not only maxBytes of the 
body) - so why should assp make an early decision, which can be wrong and has 
to be possibly be corrected? 

So - no it is not possible to change this order. 

Thomas 




Von:        "Dirk Kulmsee" <mailto:d.kulm...@netgroup.de> 
An:        "'ASSP development mailing list'" 
<mailto:assp-test@lists.sourceforge.net> 
Datum:        02.05.2018 18:26 
Betreff:        [Assp-test] noCollecting & the ham/spam corpus 
________________________________________



Hi all,

I have a question about the treatment of noCollecting emails. As I see it, a
mail first gets stored in the ham or spam folder and only after that the
decision is made to not collect it. So it is deleted again.
Question: Does this not unnecessarily weaken the corpus? Each time this
happens, a validly collected mail is overwritten / deleted. Could things be
improved by reversing the order, i. e. first checking if the mail is to be
collected and if so then put it into ham / spam?

Log snippet (from ASSP version 2.6.2  *Fortress*  build 18119):

2018-05-02 09:22:33 [Worker_1] mailto:Router@sender.domain matches
mailto:router@sender.domain in noCollecting
2018-05-02 09:22:34 [Worker_1] mailto:Router@sender.domain matches
mailto:router@sender.domain in LocalAddresses_Flat
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain info: 
DKIM-signature
precheck is skipped - DKIM result is ''
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain [Plugin] 
calling plugin
ASSP_AFC
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain ClamAV: scanned 
1219 bytes
in local message - OK
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain local (no bad 
attachments)
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] [MessageOK]
87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain 
message ok
[Mail Alert from Router] -> /opt/assp/notspam/62.eml
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain finished 
message -
received DATA size: 1.95 kByte - sent DATA size: 2.21 kByte
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain disconnected:
session:7FD26A9DF408 87.140.79.177 - processing time 1 seconds
2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177
<mailto:router@sender.domain> to: mailto:alarm@recipient.domain info: file
/opt/assp/notspam/62.eml was deleted - selected for no collection


Best regards
Dirk


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
mailto:Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test







DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no known 
virus in this email!
*******************************************************


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to