Thank you for the explanation Thomas. You are right, I chose to not collect and I am fine with that result. My concern was about the mails that were collected earlier and are overwritten in the process. Their contribution to the corpus is lost. A possible way around this could be an extra temporary storage location for the mail under investigation, where all the work is done. (There already is a tmp directory.) This would prevent existing collected mails to be overwritten. After the final results are there ASSP could decide where to move that mail file, or to delete it.
But I humbly admit that I have no idea how much benefit this would be or at what price it comes. I better trust in your excellent work. 😊 Best regards Dirk --------------------------- Von: Thomas Eckardt <thomas.ecka...@thockar.com> Gesendet: Donnerstag, 3. Mai 2018 08:32 An: ASSP development mailing list <assp-test@lists.sourceforge.net> Betreff: Re: [Assp-test] noCollecting & the ham/spam corpus Normaly assp will collect all mails. There are two states (configuration) to prevent this - noCollecting and noCollectRe. You use noCollecting. >2018-05-02 09:22:33 [Worker_1] mailto:Router@sender.domain matches >mailto:router@sender.domain in noCollecting Both states are checked in a Post-Mail-Procedure. So the rule is - collect all mails - if processed, check if any of the both flags were set in the past or has to be set (we now have the complete mail) - if the noCollect.... is detected remove the file. The 'collect all mails in doubt' rule is required for several other features and functions. A simple example: a noProcessing sender sends a virus and virus check is skipped because of this flag. Why should assp store a virus in the corpus? my questions: 1. Yes this may weaken the corpus - but it was your decision to set noCollecting - why should assp ignore it? 2. The final decision to collect or delete or not to collect a mail in a file can be only made, if we have the complete mail/file (not only maxBytes of the body) - so why should assp make an early decision, which can be wrong and has to be possibly be corrected? So - no it is not possible to change this order. Thomas Von: "Dirk Kulmsee" <mailto:d.kulm...@netgroup.de> An: "'ASSP development mailing list'" <mailto:assp-test@lists.sourceforge.net> Datum: 02.05.2018 18:26 Betreff: [Assp-test] noCollecting & the ham/spam corpus ________________________________________ Hi all, I have a question about the treatment of noCollecting emails. As I see it, a mail first gets stored in the ham or spam folder and only after that the decision is made to not collect it. So it is deleted again. Question: Does this not unnecessarily weaken the corpus? Each time this happens, a validly collected mail is overwritten / deleted. Could things be improved by reversing the order, i. e. first checking if the mail is to be collected and if so then put it into ham / spam? Log snippet (from ASSP version 2.6.2 *Fortress* build 18119): 2018-05-02 09:22:33 [Worker_1] mailto:Router@sender.domain matches mailto:router@sender.domain in noCollecting 2018-05-02 09:22:34 [Worker_1] mailto:Router@sender.domain matches mailto:router@sender.domain in LocalAddresses_Flat 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain info: DKIM-signature precheck is skipped - DKIM result is '' 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain [Plugin] calling plugin ASSP_AFC 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain ClamAV: scanned 1219 bytes in local message - OK 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain local (no bad attachments) 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] [MessageOK] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain message ok [Mail Alert from Router] -> /opt/assp/notspam/62.eml 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain finished message - received DATA size: 1.95 kByte - sent DATA size: 2.21 kByte 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain disconnected: session:7FD26A9DF408 87.140.79.177 - processing time 1 seconds 2018-05-02 09:22:34 m1-45753-00062 [Worker_1] [TLS-out] 87.140.79.177 <mailto:router@sender.domain> to: mailto:alarm@recipient.domain info: file /opt/assp/notspam/62.eml was deleted - selected for no collection Best regards Dirk ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Assp-test mailing list mailto:Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test