I'm sorry Andrea - I used the terms a bit confusing.
corpusnorm is not the 100% correct word, if we look at all files in the
corpus - we should better say, the resulting 'norm' of the spamdb and HMM
In the previouse versions the limit was MaxFiles - so if you had more
files in the folders, the 'corpusnorm' was not the norm all files - only
from MaxFiles.
Now assp uses as many as possible (max MaxFiles) , but less as required to
get the wanted norm ('corpusnorm') for the spamdb and HMM.
Target is, that you don't has to care about the folders - assp will use
and/or ignore and/or delete the right files at the right time.
>Hmm... I see now, basically with the latest change you added some logic
>so that older files (which should otherwise be discarded) are ignored
>by the rebuild process... am I right ?
I'll explain a bit more:
- all folders are processed : "the youngest files first"
- both error folders are fully processed up to MaxFiles
As the result of processing the first two folders we get a weight
(spam/ham). Now we know were we are: we have a current weight, a wanted
weight, and we now how many files are in the spam and notspam folders. Now
assp calculates the maximum of files in the spam folder that could be apx.
used , if we assume that at least all files in the notspam folder will be
enougth to get the wanted target norm.
The spam folder is processed.
Now we know the new spam/ham weigth and can more exactly calculate, how
many of the files in the notspam folder are required to reach the wanted
target norm.
I'm expressed, how exact it was working in my case.
Let's see how it works.
Thomas
Von: Thomas Eckardt <thomas.ecka...@thockar.com>
An: ASSP development mailing list <assp-test@lists.sourceforge.net>,
Datum: 11.09.2012 16:55
Betreff: [Assp-test] Antwort: Re: Antwort: strange ASSP behavior
>I see, so, basically, you're saying that the weight reported in the
"rebuild report" isn't correct ?
No - the values were correctly shown. But ASSP has used all files (up to
MaxFiles) even it was better to use some less ( from here or there) to get
a better corpusnorm.
Thomas
Von: Grayhat <gray...@gmx.net>
An: assp-test@lists.sourceforge.net,
Datum: 11.09.2012 15:25
Betreff: Re: [Assp-test] Antwort: strange ASSP behavior
> Andrea,
Hi there, Thomas, we are on the public list, aren't we :) ?
> your request was very logical.
Well... to tell it all, I reported about such a behavior here and
there, but then, I didn't really pay attention to it... until I was
forced to setup a script, scheduled at intervals, to "trim" the corpus
and restore it to "normal" and, sincerely, given that ASSP has options
to deal with this, I think ASSP *should* deal with this :) and keep the
corpus balanced
> Why is assp not able to produce a fine corpusnorm/spamdb/HMM, if all
> information is available and the folders are full of files?
> Had a sleepness night. I think I've found a way to fix this.
Now ... you make me feel somewhat guilty !! Sleep is a need and
sincerely, causing a sleepless night isn't exactly something I like to
cause (ok, given that the night went wasted thinking to code <grin>)
> After the error folders are processed, a temporary corpusnorm is
> calculated. The files in the spam and notspam folder are counted -
> and depending on the temp-corpusnorm, the spam-file-count and
> notspam-file-count, the apx. required count of spam files is
> calcuated. If these spam files are finished processed - based on the
> needed notspam word count - the apx. required count of notspam files
> is calculated.
>
> So (I hope), even if a machine gets too many or too less spams over a
> time , this logic will be able to ensure a fine corpusnorm.
I see, so, basically, you're saying that the weight reported in the
"rebuild report" isn't correct ?!? Not that it's an issue, I can live
with that but... did I get it right ? (sorry if I didn't but last night
I slept 2 hours +/- [yeah, I know, but I was dealing with some *darn*
UTM issues and had to "protect the innocent"] and today I had to travel
@ a customer site... just got back) If so, then, maybe slightly
changing the rebuild code to emit correct values may be a good idea :)
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test
DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally
privileged and protected in law and are intended solely for the use of the
individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no
known virus in this email!
*******************************************************
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test