bonehead user sends 5000 -> LocalFrequencyInt and next configs

regular user sends 5000 -> noCollecting , noCollectRe ...........

This is not a coding task - this is an organizing and configuration task. 
As I always say - RTMF!

>then delete as you already do files in
>excess of the maximum total number of files? 

Oldest fist - no content check.

>that our notspam corpus remains diverse

having 5000 times the 100% same mail-body in one folder is the same, like 
having the mail one time in this folder for HMM and bayes
having the same mail in the opposit folder one time - elimiates all the 
5000 for HMM and bayes
BTW : this is independend from the filename or subject

This is not new (since more than 10 years) - because it is one of the 
basic concepts of HMM and bayes.

>I know that we must be missing something significant.

Yes - the concept!

You waste my time Ken.

Thomas



Von:    K Post <nntp.p...@gmail.com>
An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
Datum:  21.03.2016 16:41
Betreff:        Re: [Assp-test] Max Number Duplicate File Names



-From Thomas, posted elsewhere
>Remains the (my) question - what should be done with mails that
>reaches the 'MaxAllowedHamDups' without breaking any concept and without
>creating a new folder (which breaks several concepts)?

The scenario where a bonehead user sends 5000 of the same message in an
Outlook mailmerge isn't just a conceptual possibility, it happens.  And
it's happening more and more frequently despite training, memos, 
reminders,
and a very good email blast system in place that eliminated the need for
mailmerges.

What about when doing the nightly cleanup if you were to delete files with
the same name in excess of max dups, then delete as you already do files 
in
excess of the maximum total number of files?  I thought that was what was
already happening with the spam corpus, but apparently not.

I only see upside to limiting the number of dups it notspam, but you've
stated elsewhere that the arguments herein don't make sense to you.  If
you're saying what we suggest doesn't make any sense, I know that we must
be missing something significant.  I know that bayesian filtering works
really well, but I only understand the inner workings from 35,000 feet. I
just can't understand how making every effort to insure that our notspam
corpus remains diverse doesn't make sense.

Thanks again.  Hope we can continue this discussion.

On Mon, Mar 14, 2016 at 5:28 PM, K Post <nntp.p...@gmail.com> wrote:

> On of our staff inadvertently sent about 3400 of the same test messages
> out through our server.  Okay, okay, it was me - had a loop coded wrong 
and
> before I noticed what was going on and could stop it about 3400 of the 
same
> messages went out, fortunately, they were just to me.  Sure enough, all
> 3400 were in notspam.
>
> So, could we, and does it make sense, to keep discussing this?
>
> On Thu, Mar 10, 2016 at 1:47 PM, K Post <nntp.p...@gmail.com> wrote:
>
>> Isn't that exact same logic an argument for having the maximum number 
of
>> duplicate subjects apply to the HAM / notspam folder too?  5000 or 
15000 of
>> the same message sent individually by (untrainable / apathetic) users 
would
>> fill the notspam folder and mess up HMM / Bayesian right?
>>
>> And for those RE / FWD / No subject emails, maybe we could have ASSP
>> ignore subjects shorter than say 5 or 6 characters when deleting 
duplicate
>> file names?  Then those files could get wiped out oldest first during 
the
>> maintenance.
>>
>> \
>>
>> On Thu, Mar 10, 2016 at 11:18 AM, Thomas Eckardt <
>> thomas.ecka...@thockar.com> wrote:
>>
>>> Just think about the logic behind Bayesian and HMM - this will answer
>>> your
>>> question.
>>>
>>> Having the same mail in the spam folder multiple times, this will 
score
>>> the content to extreme spam havy, even your users are using the same
>>> content - but less often.
>>>
>>> Thomas
>>>
>>>
>>>
>>>
>>>
>>> Von:    K Post <nntp.p...@gmail.com>
>>> An:     ASSP development mailing list 
<assp-test@lists.sourceforge.net>
>>> Datum:  10.03.2016 16:58
>>> Betreff:        Re: [Assp-test] Max Number Duplicate File Names
>>>
>>>
>>>
>>> I know you're all RTFM, but there's plenty of places in the GUI where 
the
>>> description isn't exactly clear or right.  For example
>>>
>>> MaxFiles
>>> If you're not using subjects as file names ( UseSubjectsAsMaillogNames 
),
>>> this is the maximum number of files to keep in each collection (spam &
>>> nonspam)
>>> It's actually less than this -- files get a random number between 1 
and
>>> MaxFiles.
>>>
>>> I AM using file names and MaxFiles DOES control the maximum number of
>>> files
>>> in each collection, despite what the description says when
>>> MaintBayesCollection is on and no max age is set. The language is not
>>> clear
>>> and that makes us assume things, sometimes incorrectly, about what the
>>> GUI
>>> really mean.  We've been working this way since ASSP came out. Because
>>> of
>>> this, I had no way of knowing that MaxAllowedDups >really< only 
applied
>>> to
>>> the spam collection.  I assumed the GUI meant the whole log of spam 
and
>>> NOTspam.  I don't think that's an unreasonable assumption, or call it 
an
>>> oversight, or a mistake on my part - but none of that justifies and 
angry
>>> sounding response from you.
>>>
>>>  I'm not looking for a fight, but I feel like I have to keep 
justifying
>>> myself after you appear to be so angry with me, and the rest of us, 
who
>>> turn to you for enlightenment.  You're carrying the entire weight of 
this
>>> project on your shoulders.  It's a lot, I know,  Can we move on and 
have
>>> a
>>> reasonable discussion here?
>>>
>>> Is there a reason that MaxAllowedDups shouldn't also apply to the 
notspam
>>> collection?   Shouldn't we want that to be the case for the same 
reason
>>> that we have it for spam?   Maybe also to the errors collections?
>>>
>>> If we don't, wouldn't the case where a staff member sends the same 
basic
>>> message to 5000 people (against my wishes, but I can't control
>>> everything)
>>> that'll take 1/3 of the other notspam messages out of the rebuild
>>> processes?  How about if 20k messages are sent?
>>>
>>> Maybe I'm just not understanding, and that's why I'm asking, but I 
hope
>>> it
>>> doesn't result in any more scolding.
>>>
>>> Thank you
>>>
>>>
>>> On Thu, Mar 10, 2016 at 4:15 AM, Thomas Eckardt
>>> <thomas.ecka...@thockar.com>
>>> wrote:
>>>
>>> > >There are about 600 of those files in NotSpam.
>>> >
>>> > 'MaxAllowedDups','Max Number of Duplicate File Names'
>>> >   'The maximum number of logged files with the same filename 
(subject)
>>> > that are stored in the spam folder (spamlog),........
>>> >
>>> > I'll write in Hebrew - possibly the english is better, if you 
translate
>>> it
>>> > back to english.
>>> >
>>> > Thomas
>>> >
>>> >
>>> >
>>> > Von:    K Post <nntp.p...@gmail.com>
>>> > An:     ASSP development mailing list 
<assp-test@lists.sourceforge.net
>>> >
>>> > Datum:  10.03.2016 00:29
>>> > Betreff:        [Assp-test] Max Number Duplicate File Names
>>> >
>>> >
>>> >
>>> > I've got UseSubjectAsMaillogNames checked (the messages are stored 
in
>>> the
>>> > folders user the subject name followed by a 6 digit number as 
expected)
>>> >
>>> > I've got MaxAllowedDups set to 3
>>> >
>>> > MaxBayesFileAge is 0
>>> > MaxFiles is 15000
>>> >
>>> > I'm noticing that MaxAllowedDups doesn't seem to be working.
>>> >
>>> > For example, a couple users often send emails with the subject
>>> > "Your Donation Receipt"
>>> > There are about 600 of those files in NotSpam.
>>> > Your_Donation_Receipt--123456.txt
>>> > where 123456 is a random differing number.
>>> >
>>> > Shouldn't only 3 of these files exist in the folder (with the 
exception
>>> of
>>> > those that were sent since the rebuild / maintenance window)?
>>> >
>>> > Thanks
>>> >
>>> >
>>>
>>> 
------------------------------------------------------------------------------
>>> > Transform Data into Opportunity.
>>> > Accelerate data analysis in your applications with
>>> > Intel Data Analytics Acceleration Library.
>>> > Click to learn more.
>>> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> > _______________________________________________
>>> > Assp-test mailing list
>>> > Assp-test@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/assp-test
>>> >
>>> >
>>> >
>>> >
>>> > DISCLAIMER:
>>> > *******************************************************
>>> > This email and any files transmitted with it may be confidential,
>>> legally
>>> > privileged and protected in law and are intended solely for the use 
of
>>> the
>>> >
>>> > individual to whom it is addressed.
>>> > This email was multiple times scanned for viruses. There should be 
no
>>> > known virus in this email!
>>> > *******************************************************
>>> >
>>> >
>>> >
>>> >
>>>
>>> 
------------------------------------------------------------------------------
>>> > Transform Data into Opportunity.
>>> > Accelerate data analysis in your applications with
>>> > Intel Data Analytics Acceleration Library.
>>> > Click to learn more.
>>> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> > _______________________________________________
>>> > Assp-test mailing list
>>> > Assp-test@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/assp-test
>>> >
>>> >
>>>
>>> 
------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> _______________________________________________
>>> Assp-test mailing list
>>> Assp-test@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>>
>>>
>>>
>>>
>>> DISCLAIMER:
>>> *******************************************************
>>> This email and any files transmitted with it may be confidential, 
legally
>>> privileged and protected in law and are intended solely for the use of
>>> the
>>>
>>> individual to whom it is addressed.
>>> This email was multiple times scanned for viruses. There should be no
>>> known virus in this email!
>>> *******************************************************
>>>
>>>
>>>
>>> 
------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> _______________________________________________
>>> Assp-test mailing list
>>> Assp-test@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>>
>>>
>>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to