On Thu, 15 Feb 2018 14:32:36 -0600 (CST)
sha...@shanew.net wrote:
> I haven't checked the math in the Bayes plugin, but it explicitly
> mentions using the "chi-square probability combiner" which is
> described at http://www.linuxjournal.com/print.php?sid=6467
>
> Maybe I'm misunderstanding what
On Thu, 15 Feb 2018, RW wrote:
On Thu, 15 Feb 2018 11:56:55 -0600 (CST)
sha...@shanew.net wrote:
So, the sample size doesn't matter when calculating the probability of
a message being spam based on individual tokens, but it can matter
when we bring them all together to make a final calculation.
On Thu, 15 Feb 2018 20:16:24 +0100
Reindl Harald wrote:
> Am 15.02.2018 um 20:10 schrieb RW:
> > I'm not saying that it doesn't matter how much you train, I'm saying
> > that if you have enough spam and enough ham Bayes is insensitive to
> > the ratio
>
> but not when the ratio differs in magn
On Thu, 15 Feb 2018 11:56:55 -0600 (CST)
sha...@shanew.net wrote:
> On Thu, 15 Feb 2018, RW wrote:
>
> > As I said, Bayes is based on frequencies.
> >
> > If a token occurs in 10% of ham and 0.5% of spam based on 10,000
> > hams and 10,000 spams, what do you think is likely to happen to
> > thos
On Thu, 15 Feb 2018 19:24:14 +0100
Reindl Harald wrote:
> Am 15.02.2018 um 19:20 schrieb RW:
> > On Thu, 15 Feb 2018 17:15:47 +0100
> > You are talking about ultra-rare tokens here, the chances of these
> > dominating a classification is negligibl
> it is not - in 2015 i had to purge "in doubt"
On Thu, 15 Feb 2018 17:15:47 +0100
Reindl Harald wrote:
> Am 15.02.2018 um 17:01 schrieb RW:
> > On Thu, 15 Feb 2018 00:01:18 +0100
> > Reindl Harald wrote:
> >
> >> Am 14.02.2018 um 23:07 schrieb RW:
> >
> >>> My point is that an imbalance doesn't create a bias
> >
> >> wrong - w
On Thu, 15 Feb 2018, RW wrote:
On Thu, 15 Feb 2018 00:01:18 +0100
Reindl Harald wrote:
Am 14.02.2018 um 23:07 schrieb RW:
My point is that an imbalance doesn't create a bias
wrong - what you tried to say was "doesn't necessarily create a bias"
- but in fact when the imbalance is too big
On Thu, 15 Feb 2018 00:01:18 +0100
Reindl Harald wrote:
> Am 14.02.2018 um 23:07 schrieb RW:
> > My point is that an imbalance doesn't create a bias
> wrong - what you tried to say was "doesn't necessarily create a bias"
> - but in fact when the imbalance is too big *it does*
>
> simply thin
On Wed, 14 Feb 2018 16:20:30 +0100
Matus UHLAR - fantomas wrote:
> >On Tue, 13 Feb 2018 21:02:46 +
> >Horváth Szabolcs wrote:
> >> One more question: is there a recommended ham to spam ratio? 1:1?
>
> On 14.02.18 15:09, RW wrote:
> >No, this is a myth. Bayes computes token probabilities
On 02/14/2018 09:20 AM, Matus UHLAR - fantomas wrote:
On Tue, 13 Feb 2018 21:02:46 +
Horváth Szabolcs wrote:
One more question: is there a recommended ham to spam ratio? 1:1?
On 14.02.18 15:09, RW wrote:
No, this is a myth. Bayes computes token probabilities from a token's
frequencies in
On Tue, 13 Feb 2018 21:02:46 +
Horváth Szabolcs wrote:
One more question: is there a recommended ham to spam ratio? 1:1?
On 14.02.18 15:09, RW wrote:
No, this is a myth. Bayes computes token probabilities from a token's
frequencies in spam and ham, so it all scales through. If you have
20
On Tue, 13 Feb 2018 21:02:46 +
Horváth Szabolcs wrote:
> One more question: is there a recommended ham to spam ratio? 1:1?
No, this is a myth. Bayes computes token probabilities from a token's
frequencies in spam and ham, so it all scales through. If you have
2000 ham and 200 spam the prob
They cannot (do not want, do not have the know how) study the e-mails, and
therefore they cannot build a reliable corpus. All they can do is to trust the
ability of their users to study their own e-mails well enough to do the job,
hence the mess with ham/spam when feeding the Bayesian filter. Th
On 13 Feb 2018, at 9:33, Horváth Szabolcs wrote:
This is a production mail gateway serving since 2015. I saw that a few
messages (both hams and spams) automatically learned by
amavisd/spamassassin. Today's statistics:
3616 autolearn=ham
10076 autolearn=no
2817 autolearn=spam
134 a
On Tue, 13 Feb 2018, Horváth Szabolcs wrote:
3. populate the ham database
That's the tricky part. As I mentioned earlier, I don't really want
end-users involved in this.
You might be able to find a few that are somewhat technically competent
and don't mind their ham samples being manually
John Hardin skrev den 2018-02-14 02:28:
Properly training your Bayes and increasing the score for BAYES_80,
BAYES_95, and BAYES_99
and BAYES_999
score BAYES_999 5000
/me hiddes, could not resists :=)
On Tue, 13 Feb 2018, David Jones wrote:
Properly training your Bayes and increasing the score for BAYES_80, BAYES_95,
and BAYES_99
and BAYES_999
is the best bet on this one.
--
John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
jhar...@impsec.orgFALaholic #11174
Hello,
David Jones [mailto:djo...@ena.com] wrote:
> With non-English email flow, it's more challenging. If no RBLs hit, then you
> really must train your Bayes properly which requires some way to accurately
> determine the ham and spam. You must keep a copy of the
ham and spam corpi and be al
On 02/13/2018 11:45 AM, Horváth Szabolcs wrote:
Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
I think I have no control over what is learnt automatically.
surely, don't do autolearning at all
This is a mail gateway for multiple companies. I'm not supposed to read e-mails
on that, or p
On 02/13/2018 11:24 AM, Horváth Szabolcs wrote:
Hello,
David Jones [mailto:djo...@ena.com] wrote:
There should be many more rule hits than just these 3. It looks like
network tests aren't happening.
Can you post the original email to pastebin.com with minimal redacting
so the rest of us can r
Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
>> This is a mail gateway for multiple companies. I'm not supposed to read
>> e-mails on that, or picking mails that can be used for learning ham
>
> how did you then manage 1.4 Mio ham-samples in your biased corpus
Looks like in this amavisd-
Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
>> I think I have no control over what is learnt automatically.
> surely, don't do autolearning at all
This is a mail gateway for multiple companies. I'm not supposed to read e-mails
on that, or picking mails that can be used for learning ham.
On Tue, 13 Feb 2018, Horváth Szabolcs wrote:
After:
pts rule name description
-- --
0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area
0.0 HTML_MESSAGE BODY: HTML included in
Hello,
David Jones [mailto:djo...@ena.com] wrote:
> There should be many more rule hits than just these 3. It looks like
> network tests aren't happening.
> Can you post the original email to pastebin.com with minimal redacting
> so the rest of us can run it through our SA to see how it scores
On 02/13/2018 07:55 AM, Horváth Szabolcs wrote:
Dear members,
User repeatedly sends us spam messages to train SA.
Traning - at the moment - requires manual intervention: administrator verifies
if it's really spam then issues sa-learn.
Then the user thinks the process is done, and the next time
Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
> > However, that doesn't happen.
> > 0.000 0 338770 0 non-token data: nspam
> > 0.000 01460807 0 non-token data: nham
> what do you expect when you train 4 times more ham than spam?
> frankly you "
Dear members,
User repeatedly sends us spam messages to train SA.
Traning - at the moment - requires manual intervention: administrator verifies
if it's really spam then issues sa-learn.
Then the user thinks the process is done, and the next time when the same email
arrives, it will automatica
27 matches
Mail list logo