OT - 18 months and I'm still alive

2018-02-02 Thread Marc Perkel

hello everyone,

I haven't been posting much here lately but I want to let everyone know 
that I'm still alive. I sent this out to my list of friends. Feel free 
to share it. And I'm still filtering spam.


Hello friends and family,

As of today I am 18 months in since I was diagnosed with stage 4 lung 
cancer on Aug 1 2016. And I'm still alive and feel mostly normal. Back 
when I was diagnosed I didn't expect to still be around this long 
especially considering that I have refused all conventional treatment. 
In fact, it's more likely that chemo would have had no effect against my 
cancer and would likely have killed me.


I have just completed a second round of radiation and immunotherapy 
drugs and this time there was no effect either positive or negative. The 
idea was that if there is an immune response in progress that this will 
boost it. But at this point I still have no idea why I'm still alive 
because the scans remain confusing.


The last scan in early December showed everything the same size, nothing 
changed; nothing bigger, nothing smaller, no new mets. Most people would 
be thrilled but I'm confused. The immune system trick I'm doing has a 
more binary outcome where it either totally wipes out the cancer or it 
does nothing. So I expected it to either be a lot bigger or a lot smaller.


There are several possible explanations.

1. White blood cells also light up on a PET scan and the cancer might 
already be totally dead and is very slowly being eaten by the immune 
system. And that would be great.


2. I have been taking an anti-cancer cocktail of my own design that 
attacks cancer through multiple pathways and this combination has kept 
the cancer at a stalemate, neither growing or shrinking. But it would be 
odd that it didn't grow or shrink on any tumor. But apparently this 
cancer is hard to image and results are not reliable. In the past 
different doctors see different things.


3. I have a very slow growing cancer, and I'm just lucky, and nothing I 
did had any significant effect.


4. The universe really can't get along without me in it and it changed 
its mind about booting me out.


At this point I actually feel like I know less than I used to but at the 
moment I'm working on 2 different strategies at the same time. While I 
am hopeful the immunotherapy trick worked I have added more anti-cancer 
supplements as well as starting on the Ketogenic diet. If option 2 is 
correct my new plan should set the cancer back quite a bit.


I recently found out that normal cells can run on either sugar or fat, 
but cancer runs on only sugar. That immediately lead to the ketogenic 
diet where I eat no carbs and make up for the calories in eating fat. 
And as a side effect - I'm losing weight. There also is evidence that 
once I get to a very sugar starved state that hyperbaric oxygen 
treatments might really fry cancer. But I'm still designing a cocktail 
that should really kick butt in a way that has beneficial side effects.


The keto diet is very counterintuitive. Bacon good, fruit bad. You have 
to eat fat to lose fat. But I'm losing weight and women should find me 
even more irresistible. It's a diet that seems to work well for anyone 
contemplating dieting.


I'm also documenting everything so if you know people who have cancer - 
this might be useful. And feel free to pass this email to anyone 
interested. It might turn out that I'm one of the most advanced minds on 
the planet for fighting cancer and if that's true - isn't that just a 
little bit sad. Here's the link:


http://wiki.junkemailfilter.com/index.php/Cancer

I'm also discovering the very very premise that the oncology world is 
based on - is wrong. Most people believe cancer starts with a genetic 
mutation that leads to cancer. But it turns out that cancer might start 
as a metabolic disorder that leads to genetic damage. If this is true it 
might be easy to create a general cure for all cancers that targets the 
unique metabolism of the cancer cells. Something like creating a toxin 
that activates in the presents of fermentation byproducts and attached 
to sugar as a delivery mechanism. You get a shot and the next day all 
your cancer is dead. More on that in the future.


So - bottom line is - more likely than not that I'm still here next 
year. But I can still drop dead unexpectedly at any moment. If that 
happens either the universe will end or someone will start a cult in my 
name based on the worst aspects of my personality. (I'm glad I won't be 
around to see that.) But if I'm dying I'm doing it way to slow to be 
interesting. Thanks for reading my detailed explanation of no news really.


Marc Perkel
m...@perkel.com
Twitter: mperkel



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Cell phone networks list?

2017-10-24 Thread Marc Perkel
Does anyone have a cell phone network list of host names where email 
from cell phones might be coming from? So far I have:


mycingular.net
myvzw.com

Can you add to this list?

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Would anyone be interested in a SA enhancing service?

2017-09-22 Thread Marc Perkel
I think people are misunderstanding. It's not a spamd service. It's 
basically another rule you would add to your config.


I think I need to do it first and then talk about it.

On 09/22/17 09:33, Kevin A. McGrail wrote:
It's very feasible but it's a blurry off topic issue to even discuss 
here for a commercial service.


At worst you just make yourself a standard mx of record filter system 
provider.


If you want a "plugin" you just offer spamd service restricted by ip 
address with ssl.


Perhaps you are over thinking things? Just offer a complete spamd 
replacement instead of an extra test.


My $0.02.
Regards,
KAM

On September 22, 2017 12:17:04 PM EDT, Marc Perkel 
<supp...@junkemailfilter.com> wrote:


Probably both. Not sure. Just trying to see if it's feasible.

On 09/22/17 09:12, Kevin A. McGrail wrote:

Are you discussing a free or a commercial service?
Regards,
KAM

On September 22, 2017 11:40:50 AM EDT, Marc Perkel
<supp...@junkemailfilter.com> wrote:

This is something I'm thinking about doing - providing a service that
integrates into SA as a plug in and communicates with my servers to
return a useful score enhancer.

If there is interest my initial demo test will be just stuffing the
subject line into a IP/port and returning a number where positive is
spam and negative is ham. This would just be a proof of concept.

The next level would be sending the message headers and eventually - the
full message.

Would need someone to write a simple plugin - not a perl guy - but how
hard can that be? Would eventually need to be encrypted though.

Starting with just the subject won't return a result all the time. Many
request will return a 0 if it can't figure it out. If it does return a
result that is significantly away from 0 it's probably right. And it is
more likely to return a result from ham than spam confirming good email
as good. Obviously - using the header and then the whole message will be
more accurate.

I'm using new techniques no one else is using.

So - any interest?



    -- 
Marc Perkel - Sales/Support

supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
    415-992-3400



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Increasing spam level for MX backup server?

2017-09-22 Thread Marc Perkel
Yes - that's a favorite trick of spammers to hit the backup server. If 
you want you can add a third (fake) backup server:


tarbaby.junkemailfilter.com

It returns 451 on everything and gets rid of some of that spam (spammers 
don't retry) and I get some training data for my black lists.


On 09/22/17 09:19, Davide Marchi wrote:

Hi friends,
On Debian Jessie, Postfix 2.11.3 and Spamassassin 3.4.0-6, I've just 
setup an MX email backup server and now I realize that new spam come 
from the MX backup server..
Is there any way to tell to reject any mail coming to the MX backup 
server, if the primary server is up?
And again, many spam email came from a mine fake and nonexistent 
"alias", for example:


on my server I guest i...@foo.org, and its alias: ali...@foo.org and 
ali...@foo.org, and stop. The spam come from ali...@foo.org, that 
doesn't exist, how I could reject and prevent to delivery from these 
address, without compromise the backup server?




Many many thanks!


Davide
Italy





--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Would anyone be interested in a SA enhancing service?

2017-09-22 Thread Marc Perkel

Probably both. Not sure. Just trying to see if it's feasible.

On 09/22/17 09:12, Kevin A. McGrail wrote:

Are you discussing a free or a commercial service?
Regards,
KAM

On September 22, 2017 11:40:50 AM EDT, Marc Perkel 
<supp...@junkemailfilter.com> wrote:


This is something I'm thinking about doing - providing a service that
integrates into SA as a plug in and communicates with my servers to
return a useful score enhancer.

If there is interest my initial demo test will be just stuffing the
subject line into a IP/port and returning a number where positive is
spam and negative is ham. This would just be a proof of concept.

The next level would be sending the message headers and eventually - the
full message.

Would need someone to write a simple plugin - not a perl guy - but how
hard can that be? Would eventually need to be encrypted though.

Starting with just the subject won't return a result all the time. Many
request will return a 0 if it can't figure it out. If it does return a
result that is significantly away from 0 it's probably right. And it is
more likely to return a result from ham than spam confirming good email
as good. Obviously - using the header and then the whole message will be
more accurate.

I'm using new techniques no one else is using.

So - any interest?



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Would anyone be interested in a SA enhancing service?

2017-09-22 Thread Marc Perkel
This is something I'm thinking about doing - providing a service that 
integrates into SA as a plug in and communicates with my servers to 
return a useful score enhancer.


If there is interest my initial demo test will be just stuffing the 
subject line into a IP/port and returning a number where positive is 
spam and negative is ham. This would just be a proof of concept.


The next level would be sending the message headers and eventually - the 
full message.


Would need someone to write a simple plugin - not a perl guy - but how 
hard can that be? Would eventually need to be encrypted though.


Starting with just the subject won't return a result all the time. Many 
request will return a 0 if it can't figure it out. If it does return a 
result that is significantly away from 0 it's probably right. And it is 
more likely to return a result from ham than spam confirming good email 
as good. Obviously - using the header and then the whole message will be 
more accurate.


I'm using new techniques no one else is using.

So - any interest?

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: OT - Possibly some good news

2017-07-06 Thread Marc Perkel

Hi Ted,

You know what's interesting is that the adaptive immune system seems to 
work a lot like a spam filter or an antivirus program. Technically what 
I did was a database update to my immune system to reclassify my cancer 
as enemy. And the code to kill the cancer is in the cancer.


On 07/05/17 09:54, Ted Mittelstaedt wrote:

Hi Marc,

There are drugs that will stimulate the production of white blood 
cells at a tremendous rate, I was on one of these back in '94 when I was

dealing with my cancer.

It's well known in oncology that everyone always has a few cancer cells
bouncing around in their bodies all the time, but that the immune system
takes care of them.  It's just that cancers grow so fast once they get
established that they overwhelm the immune system.  That's why once you
get diagnosed with cancer your body needs help.

My personal opinion is that there's another factor you aren't mentioning
here and that's whether or not you're a "fighter personality"  AKA 
jackass.  In short, do you like to fight?  (you do, I can tell that 
just by reading your post)


I've met a number of people with cancer, serious cancers, since my own.
Some have later died.  But, all of the people who have survived their
cancers that I've known - they have been fighters.   I've never known
a cancer survivor with a passive, resigned to their fate attitude who
has survived a serious cancer.

I don't think oncologists like to talk about this much because it seems
kind of unfair to say that if your a nice person (you don't have a 
fighter personality) your definitely gonna die, and if you are an

asshole (you do), your probably gonna live (no guarantees, though).

Fighters have different ways they fight, also, and medical science
really doesn't like arbitrary cures, you know.  They want something
that works all the time for everyone, the same way.  That's why
they love the drugs so much and really dislike the holistic crap.

But I'm pretty sure most of the good medical researchers, if you
nailed them to a wall, they would admit this kind of thing exists,
and I daresay that there's a hell of a lot of drug researchers out
there trying to figure out what drug they can create that will
"switch on" the fighter personality...

Ted


On 7/4/2017 8:45 AM, Marc Perkel wrote:

I know this is off topic, but it is looking like I might not die from
cancer after all. At some point I'll write something up about how the
immune system is like a spam filter. But today - I think I might have
cured my incurable cancer.

As you all know from my previous announcements that I have been working
on designing a custom immunotherapy treatment that has never been tried
before, and the hard part as usual, getting the doctors to do it. Well -
I finally got the treatment and - it appears as if it worked. And I
stress the word "appears" because it's looks like it's going to take
about 2 months before imaging is going to show what's happening. But my
cancer symptoms are gone.

I am in a state of stunned disbelief. Too early to believe it - too late
not to believe it.

On Monday June 19 I got an infusion of ipilimumab which is an
immunotherapy drug. 2 days later I got a series of 3 radiation
treatments (21st thru 23rd). Dosage, 3 fractions of 9gy xrays from
Varian Trilogy set at 9MV. These treatment we unusual in that instead of
irradiating the whole tumor, I asked that they just burn a disk in the
center of the main tumor leaving the rest of the tumor undamaged. This
request was very counter intuitive in radiology because they are trained
to kill every cancer cell they can possibly hit and it took a lot of
work to get them to deliberately leave tumor undamaged.

But that was important because I was turning the tumor into a school,
not a battlefield, where I was teaching my immune system what the cancer
looked like (antigens) and classify it as an enemy. By using partial
radiation I created an environment where white blood cells in my immune
system could interact with dead cancer and learn it.

4 days after treatment I started getting a reaction. I was queasy, low
energy, aches and pains, chills. Wgen I got home I had a fever of 101,
and it occurred to me, is this the fever I was hoping for?

Fever indicates that I'm having an immune response. My immune system is
fighting something. Was it attacking the cancer?

So I took a hot bath and used a heating pad to increase the fever and
got it up to 103. I wanted to create heat shock proteins and signal the
battle was on. Wednesday still had fever and was rather out of it.
Thursday morning fever broke and all my cancer symptoms were gone.

I have aednocarsonoma and the aedno part of the name means "mucus
secreting". On Thursday the mucus went to almost none. I had been
coughing up a lot even before I was diagnosed last August. I went out
and sawed limbs off a tree, hard work, and didn't cough up anything. At
night when I lay down and in the morning when I get up, almos

OT - Possibly some good news

2017-07-04 Thread Marc Perkel
elf is victory.


I will write more when I find out more.

Marc Perkel
Random Genius


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Outgoing email without DMARC

2017-05-02 Thread Marc Perkel



On 05/02/17 07:14, Rob McEwen wrote:

On 5/1/2017 10:30 PM, Marc Perkel wrote:

Might be slightly off topic but I've been running into more delivery
problems with outgoing email because I don't use DMARC. I don't know a
lot about it but is there some simple way I can get around this? Kind of
a pain in the rear.


Marc,

This probably has more to do with DKIM than DMARC?

Either way... you're not willing to jump (or haven't yet jumped) 
though the hoops that the largest ISPs/hosters want us all to jump 
through... meanwhile... so many of them (and for many many years) have 
sent such high volumes AND high percentages of outbound spam to all of 
our SMTPs - to such an extent that you and I would be out of business 
if our SMTP outbound traffic did that for just one week.


I sort of wish they (or many of them... "if the shoe fits...") would 
clean up their own act FIRST - get the basics done FIRST - before 
imposing new standards on the rest of us.


I'm in the same boat - I'm now having to set aside dozens of hours to 
get all various domains updated to DKIM so that they'll have more 
success sending to a certain large/famous hoster - who has sent my 
server a shitload of spam over the past several years (not just 
volume-wise - but percentage-wise... I'd be run out of town if I did 
that)




Yeah - I know what you mean. Many of these ISPs would be blacklisted if 
they weren't so big. I get an amazing amount of spam from the big guys. 
I was just wondering what I could do to get by.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Outgoing email without DMARC

2017-05-02 Thread Marc Perkel



On 05/02/17 03:54, RW wrote:

On Mon, 1 May 2017 19:30:01 -0700
Marc Perkel wrote:


Might be slightly off topic but I've been running into more delivery
problems with outgoing email because I don't use DMARC.

How do you know it's because you don't use DMARC.




The rejection message specified dmarc as the reason.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Outgoing email without DMARC

2017-05-01 Thread Marc Perkel
Might be slightly off topic but I've been running into more delivery 
problems with outgoing email because I don't use DMARC. I don't know a 
lot about it but is there some simple way I can get around this? Kind of 
a pain in the rear.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: New whitelisting trick using from and spf

2017-03-06 Thread Marc Perkel



On 03/06/17 15:22, David Jones wrote:

From: Marc Perkel <supp...@junkemailfilter.com>
Sent: Monday, March 6, 2017 11:05 AM
To: users@spamassassin.apache.org
Subject: Re: New whitelisting trick using from and spf

do you mean the header From: address?

because anyone doing SPF does spf checks does what you describe on the
envelope from: addres.

Yes - I'm using the headers From: address.

Not good.  SPF should be checked against the envelope-from
address which is more trustworthy.  The From: header can be
spoofed trivially with no validation/authentication if DMARC is
not enabled.  Most email is not enabled for actual DMARC checking.
Most have SPF enabled.  Some have DKIM enabled.  But DMARC
can go one step further to check the From: header and most don't
do it unless they are a major target of spoofing like Paypal, eBay,
etc.

Dave




Yes - I'm doing something different - and possibly more effective. And 
it's working really well. Those who spoof would fail the test and not 
get while listed. The fact that From is easier to spoof makes it more 
effective - not less.


So if the from is @paypal.com and the sending host is not SPF compatible 
then it doesn't get white listed. Seems to be working very well.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



List of legit mass mailers

2017-03-06 Thread Marc Perkel
Just wondering if anyone has - or in interested in - a list of legit 
mass mailing sources?


There are many domains that remail/deliver for other domains that are 
95%+ good email. And they are not perfect and sometimes they get scammed 
but are mostly good.


Just wondering if anyone has a list - or is interested in me producing 
such a list?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: New whitelisting trick using from and spf

2017-03-06 Thread Marc Perkel



On 03/06/17 04:19, Matus UHLAR - fantomas wrote:

On 05.03.17 10:38, Marc Perkel wrote:

Well, new to me. Maybe others have thought of this.

Many domains send nothing but good email and if you whitelist them 
based on FCRDNS all is good. Been doing that.


But ...

Many domains send nothing but good email and they send through 
reputable email sender services which are mostly good by not perfect. 
So can't just whitelist that.


What I'm doing now is whitelisting the domains that are good, but 
doing SPF checks on the from address.


do you mean the header From: address?

because anyone doing SPF does spf checks does what you describe on the
envelope from: addres.


Yes - I'm using the headers From: address.



If the from address is whitelisted AND the SPF of the from address is 
good - I pass the email.


or do you do this on MTA-level (which means it's off-topic)?



I do it at the MTA level - but it's not off topic because the concept 
can be applied to spamassassin.


Also - I have almost 100,000 domains in my hostkarma.junkemailfilter.com 
(127.0.0.1) rbl. So I'm passing a lot of good email with this trick.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



New whitelisting trick using from and spf

2017-03-05 Thread Marc Perkel

Well, new to me. Maybe others have thought of this.

Many domains send nothing but good email and if you whitelist them based 
on FCRDNS all is good. Been doing that.


But ...

Many domains send nothing but good email and they send through reputable 
email sender services which are mostly good by not perfect. So can't 
just whitelist that.


What I'm doing now is whitelisting the domains that are good, but doing 
SPF checks on the from address.


If the from address is whitelisted AND the SPF of the from address is 
good - I pass the email.


I'm still experimenting with this but I think I'm onto something.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Possibly some good news - OT

2016-10-11 Thread Marc Perkel
. 
And these 2 substances are all over the place in NIH studies for new 
cancer treatments for a variety of age related cancers.


After starting to take the Amazon drugs I did notice that I'm coughing 
up less stuff and when I started talking the prescribed drugs there 
seemed to be another reduction in mucus production. And that might be a 
good sign. So I am now talking all 4 drugs in combination, and if I'm 
right this is the most effective treatment for my specific lung cancer 
ever used. And it made enough sense to talk my oncologist into giving it 
a try.


At this point I'm optimistic that I have kicked the can down the road 
and that it's now years and not months. If I get 2 years out of this I'm 
calling it a win. And - I might be dead wrong, it might not work, and 
unexpectedly dropping dead is a side effect of Vandetanib. But even if 
that happened - this is still my best choice. In my battle against 
cancer, round one goes to me.


Ultimately in a battle which includes certain death the only variable is 
the quality of the battle. I'm beginning to relate to Klingons in Star 
Trek in that the fight is as important as the outcome. And this is the 
kind of fight that represents who I am and how my final adventure plays 
out. And in some ways defeating the hospital an insurance bureaucracy is 
probably tougher that figuring out how to cure cancer. So the idea of me 
beating this is very unlikely, but as Elon Musk would say, "Success is 
at least one of the possible outcomes."


Now I'm off to learn more of the big medical words to prepare for round 
2. How hard can it be?


So - just wanted to share my possible good news and hopefully I will be 
around to be at 2 more Pioneer Awards in the future. That's my new 
working estimate.


Marc Perkel
/root

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: spamassassin and caching nameservers

2016-08-22 Thread Marc Perkel
For what it's worth I use PowerDNS for a recursive nameserver and happy 
with it. Very easy to set up.


On 08/22/16 18:15, Alex wrote:

Hi all,
I've just set up spamassassin on a cable connection that appears to
have sporadic DNS timeouts using bind. It shouldn't be so slow that
queries timeout, but apparently they are. I'm hoping rbldnsd would
provide that additional responsiveness needed.

I've set up rbldnsd before, to be used as a way to query a local RBL.
Has anyone configured it as a local caching nameserver, and if so,
could you share your config?

I'd like it to listen on localhost/53 in place of bind and I would
think I would need the root zones in there somewhere, but there
doesn't appear to be many examples of doing this out there to
reference.

Is it a full-fledged nameserver, suitable enough for MX, A, TXT,
queries, etc for this purpose?

Thanks,
Alex




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 09:06, Dianne Skoll wrote:

On Mon, 22 Aug 2016 09:03:38 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:


The ones that are the same are of no interest. Only where it matches
one side and not the other.

But... but... that's exactly like Bayes if you throw out tokens whose
observed probability is not 0 or 1.

Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
and that's why you get good results.  Multiword Bayes is just as good,
and I know that from experience.




This is nothing like bayes. Bayes is creating a mental block. When I 
describe it to people who don't know bayes they immediately get it. If I 
describe it to people who know bayes - they confuse it. Bayes is a 
probability spectrum based on a frequency match on both sets. That's not 
even close to what I'm doing.


Also - some of what I'm doing is all combinations, not just sequential. 
So it's like a system that writes and scores it's own rules. I just 
throw data at it and it classifies it.


The real magic is the feedback learning. So as it identifies ham it 
learns new words and phrases that then match email from other people. So 
it learns how normal people speak, it learns how spammers speak, and it 
identifies the DIFFERENCES between the two. And it's completely automated.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 08:58, RW wrote:

On Mon, 22 Aug 2016 07:34:00 -0700
Marc Perkel wrote:


On 08/22/16 07:28, Dianne Skoll wrote:

The other two possibilities (no tokens in either or some tokens in
both) are undecidable.

Exactly!

In the past you've said that when there are token in both you compare
the counts.


I do a very little bit of that. I make additional sets I cal nearly-ham 
and nearly-spam where the ratio is very high, and count it as a half score.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 07:45, Dianne Skoll wrote:

On Mon, 22 Aug 2016 07:34:00 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:


So.  What percentage of emails using your algorithm are actually
decidable?

Almost 100% if you look at a wide variety of tokens from multiple
attributes.

I can't believe that, or I'm missing something.  Almost every spam I see
contains words that also appear in ham.  Things like "this" or "invoice"
or "regards" or "dear".

What am I missing?




Hi Dianne, what your missing are word combinations. Usually it's not a 
single word but a combination of words that trigger a result.



 Example of how NOT matching works

Let’s take 2 subject lines and see how this works.

“Meet hot Russian Brides Online!”
“I read an article about Russian Brides in a magazine”

A traditional spam filter using Bayesian or hard coded rules about 
“Russian Brides” might determine that only 1 out of 500 emails 
mentioning the phrase “Russian Brides” is a good email. Thus the second 
line would have points assessed against it in the classification process 
using these traditional methods.


Using the Evolution Filter the phrase “Russian Brides” is in both sets 
and therefore has no influence on the results. But the first subject 
matches these phrases in the Spam Only set.


“Meet hot”
“Meet hot Russian”
“Meet hot Russian Brides”
“hot Russian Brides Online!”
“Russian Brides Online!”
“Brides Online!”
“Online!”

The second subject matches these phrases on the ham only set that are 
never used on the spam set.


“I read an article”
“read an article”
“read an article about”
“about Russian”
“an article about”
“in a magazine”
“Brides in a”

So even though the phrase “Russian Brides” has no influence each subject 
hits either ham or spam many times where the same phrase was never used 
in the subject line in the opposite set. And the number of hits is 
significant enough just from these subjects to cause the fingerprints to 
be learned, and that’s just looking at the Subject attribute. When this 
is combined with testing all attributes the messages usually come out 
strongly on one side or the other.


In rule based systems one would not normally build a white list rule to 
to allocate points based on seeing the phrase “read an article about”. 
That’s where the Evolution Filter is different. It didn’t need to have 
that rule because since it is comparing to the infinite set of what is 
not matched on the other side, it dynamically create billions of rules 
automatically.



 [edit
 
<http://wiki.junkemailfilter.com/index.php?title=The_Evolution_Spam_Filter=edit=6>]




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 07:40, Antony Stone wrote:

On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:


On 08/22/16 07:28, Dianne Skoll wrote:


What percentage of emails using your algorithm are actually
decidable?

Almost 100% if you look at a wide variety of tokens from multiple
attributes. Subject, body, content flags, header structure, combinations
of all domains reference, php scripts, name part of from addresses,
behavior flags.

I would have said that a very large number of the words used in spam mails are
the same as the words used in ham mails, so I suspect I'm confused about what
constitutes a "token".


The ones that are the same are of no interest. Only where it matches one 
side and not the other.




I fail to see how the "name part of from addresses" are unlikely to match ham,
for example, since I see quite a lot of spam apparently from myself.


Antony.



Some spammers have Viagra in the name part. The name part is very 
spammy. I also store to and from email addresses so that relationships 
between people corresponding create a ham result. (I filter outbound as 
well for some people)


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 07:37, Antony Stone wrote:

On Monday 22 August 2016 at 16:34:09, Marc Perkel wrote:


OK - Trying to make the really simple. Just talking about concept now.

Let's say I get an email where the subject is "I have aednocarsonoma of
the lung".

Right off you know it's ham because spammers never use the word
"aednocarsonoma" and normal people do. Spammer also never use:

"of the lung"
"the lung"
"aednocarsonoma of"

How do you create those boundaries to define the tokens?


Here's an example:

"the quick brown fox jumps over the lazy dog"

becomes ...

"the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox" "brown fox" 
"quick brown fox"
"the quick brown fox" "jumps" "fox jumps" "brown fox jumps" "quick brown fox jumps" 
"over" "jumps over"
"fox jumps over" "brown fox jumps over" "the" "over the" "jumps over the" "fox jumps 
over the"
"lazy" "the lazy" "over the lazy" "jumps over the lazy" "dog" "lazy dog" "the lazy dog" 
"over the lazy dog"











So - tell me you follow this so far. Spammers don't spam about
aednocarsonoma.

In this case I'm identifying ham because in some previous email people
were talking about lung cancer and those phrases were learned as ham.
But what makes it really ham is not just that it matches previous ham,
but it doesn't match previous spam.

A word like Viagra for example would produce no score because it is in
both sets. However "cheapest viagra online" would match spam and not
match ham indicating it's spam.

So what makes "cheapest Viagra online" a token, such that "cheapest" and
"online" are not tokens?




They would all be tokens. Just pointing out one that would match spam 
and not match ham. "cheapest" and "online" would likely be in both sets 
and would be ignored.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel

OK - Trying to make the really simple. Just talking about concept now.

Let's say I get an email where the subject is "I have aednocarsonoma of 
the lung".


Right off you know it's ham because spammers never use the word 
"aednocarsonoma" and normal people do. Spammer also never use:


"of the lung"
"the lung"
"aednocarsonoma of"


So - tell me you follow this so far. Spammers don't spam about 
aednocarsonoma.


In this case I'm identifying ham because in some previous email people 
were talking about lung cancer and those phrases were learned as ham. 
But what makes it really ham is not just that it matches previous ham, 
but it doesn't match previous spam.


A word like Viagra for example would produce no score because it is in 
both sets. However "cheapest viagra online" would match spam and not 
match ham indicating it's spam.


The magic here is that this detects both spam and ham. And it is 
especially good at detecting ham, which greatly reduces false positives.




Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 07:28, Dianne Skoll wrote:

On Mon, 22 Aug 2016 07:16:41 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:


Anthony, Yes - I don't store Set B. I store Set A. B is defined by
what's NOT in A. So I test A and if it's not matched it's set B. Set
B is just a negative match on A.

Let me ask you a question.  As far as I understand your algorithm, if
an email contains at least one token in the "ham" set and zero tokens in
the "spam" set, you classify it as ham.  And conversely, if it contains
at least one spam token but zero ham tokens, you classify it as spam.


YES! YES! YES!

Although I look at some thousand "fingerprints" to get a more 
significant result.




The other two possibilities (no tokens in either or some tokens in both)
are undecidable.


Exactly!



So.  What percentage of emails using your algorithm are actually decidable?


Almost 100% if you look at a wide variety of tokens from multiple 
attributes. Subject, body, content flags, header structure, combinations 
of all domains reference, php scripts, name part of from addresses, 
behavior flags.




Regards,

Dianne.





--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel



On 08/22/16 06:55, Antony Stone wrote:

On Monday 22 August 2016 at 15:46:41, Dianne Skoll wrote:


On Mon, 22 Aug 2016 06:04:49 -0700

Marc Perkel <supp...@junkemailfilter.com> wrote:

Set A - a  finite set - has some members,
Set B - an infinite set - is everything that is NOT in Set A

Set B is a very special case of an infinite set.  We're talking about
infinite sets in general.

Also, you have to realize that although set B is in principle infinite,
in practice it is not.  Computers have finite memory, and although the
number of email tokens representable in the memory of a computer is very,
very, very large, it's not infinite.

I do not think that Marc is proposing to actually store set B in a computer
(or anywhere else).

Set B is simply a theoretical construct, defined as the inverse of Set A, and
to discover whether something is a member of it, you do not search through the
infinite set B for a match, you instead check all members of finite set A for a
non-match.

If nothing in Set A matches X, then X is a member of Set B.


Antony.



Anthony, Yes - I don't store Set B. I store Set A. B is defined by 
what's NOT in A. So I test A and if it's not matched it's set B. Set B 
is just a negative match on A.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Matching infinite sets

2016-08-22 Thread Marc Perkel

I'm confused by the confusion here.

Set A - a  finite set - has some members,
Set B - and infinite set - is everything that is NOT in Set A

So you match a test item to Set A and if it matches it's a member of A. 
If it doesn't match Set A it's a member of B.


How is this not really simple?


Matching infinite sets

2016-08-21 Thread Marc Perkel
Actually - you can match an infinite set. And maybe this is what it's 
hard for some people to wrap their head around.


Suppose set A contains 2 items, apples and oranges.
So we define set B as everything in the universe that is not in set A.
So set B is an infinite set, everything in the universe EXCEPT apples 
and oranges.


Our first test set contain an orange - so it matches set A and not set B.
Our second test set contains a cherry - so it doesn't match set A but it 
does match set B.


When you have a method that matches against infinite sets to completely 
changes how you think about spam and ham detection.


On 08/16/16 12:57, Shawn Bakhtiar wrote:


/
/
/By they way, you can’t match an infinite set (well theoretically but 
not actually). /
/https://en.wikipedia.org/wiki/Intersection_(set_theory)/ 
<https://en.wikipedia.org/wiki/Intersection_%28set_theory%29>

/
/



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-17 Thread Marc Perkel
For what it's worth I have noticed that people who are familiar with 
Bayesian filtering seem to have a mental block when it comes to 
understanding this. People who know nothing about bayesian get it 
instantly. Here's the actual formula.


card(Test_message intersect Spam diff Ham) minus card(Test_message intersect 
Ham diff Spam)



On 08/17/16 09:16, Shawn Bakhtiar wrote:


On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas 
<uh...@fantomas.sk <mailto:uh...@fantomas.sk>> wrote:


On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect 
HAM and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.


so, if mail matches both hammy and spammy tokens (or token sets), you 
don't

classify at all?



I guess what is confusing me (and I imagine others, as alluded to by 
Matus) is the fact that you are describing a special condition 
of Bayes' probability theorem. You are testing two variables (match 
SPAM and match HAM) (not matching is simply the negation of matching) 
thus giving you four conditions:


1) SPAM  &
2) SPAM  &&~HAM
3) ~SPAM &
4) ~SPAM &&~HAM

Here is a great diagram to show the four probable conditions:
https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg

So (if I am correct) Matus is asking what if condition 1 is true? How 
are you classifying an email than? Which is often the state of most 
emails, and thus why the use of Naive Bayes spam filtering, which 
generates a probability based on Bayes' probability theorem and is the 
conventional methodology to date. A Rose by any other name


Condition 4 is obvious it's nothing you have ever seen so classifying 
it anything other than HAM would be a huge mistake (IMHO), and fully 
covered by the aforementioned theorem as the probability of SPAM would 
(should) be 0. Same with Condition 3, obviously it never hits SPAM so 
wether it matches HAM or not you're going to treat it as HAM anyway 
same as condition 4.


That leaves condition 2. Which (if I'm not mistaken) is "... it 
matches SPAM and does NOT match HAM - then it's SPAM.". Which brings 
us back to Matus question, what if the email contains a single HAM 
token? Two HAM tokens? This is exactly what Bayes' probability theorem 
is designed for. All you are doing is defining a special condition in 
which the HAM probability is ZERO.


I think that's were I need to understand a bit more about what HAM 
means in this solution, does getting a hit on HAM somehow negate it 
being SPAM completely? In other words if the email contains some set 
of tokens that are SPAM, yet only one HAM token, that single HAM token 
makes it not SPAM? If so, you have a long way to go in convincing me 
that this is a good solution.


So if I say to you, "Let's get some lunch" that's ham because 
spammers never say that, but normal people do. So the way to test 
what "spammers never say" is to store what they do say and see if 
it's NOT in the list. (Thus the infinite set)




Actually I get SPAM with that very set of tokes in it. If somehow the 
HAM rating of it overrides the SPAM, I don't believe it would have a 
desirable effect.


I get plenty of:

"
Hay Shawn,

Hope you have time to do some lunch, click on this link and check out 
my new pictures!


Wannabe Phisher
"

Based on your example there's plenty of HAM and SPAM tokens in there, 
"Click on this link" high probability of SPAM-e-ness, would it get 
HAMed based on "hope you have time to do lunch". Or am I missing 
something?



Similarly, there's only so many ways to misspell viagra, and good 
email wouldn't have it spelled wrong.


Does that make sense?




Again, what you are saying makes sense in that it is special condition 
of the probability theory, What does not make sense is why would you 
not simply use the probability theory, that already encompasses that 
condition?



--
Matus UHLAR - fantomas, uh...@fantomas.sk <mailto:uh...@fantomas.sk> 
; http://www.fantomas.sk/

Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-17 Thread Marc Perkel



On 08/17/16 03:51, Antony Stone wrote:

On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:


What I'm doing is looking for fingerprints in email that intersect HAM
and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers
never say that, but normal people do. So the way to test what "spammers
never say" is to store what they do say and see if it's NOT in the list.
(Thus the infinite set)

What length are the tokens you store in the list?  Single words (so the above
lunch example would contain 4 tokens)?  Entire phrases (so the above would be
just 1 token)?  Also how do you deal with spam which contains random cuttings
from legitimate texts (generally along with a graphic attachment and/or a URL
to get aross the "real" message)?


I tokenize a lot of different things but the fingerprints are at most 3 
to 4 tokens long. If you go more then you get a database that's too big. 
And in the body I'm just looking at the first 50 words, and a "concept 
parser" that looks at the whole body.


http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter




Similarly, there's only so many ways to misspell viagra, and good email
wouldn't have it spelled wrong.

Does this mean that people with bad spelling will more likely get classified as
spam, because they do not match the 'ham' group very well?
No - unless they misspell a lot of words the same way spammers misspell 
it. If a spammer isn't misspelling the same way and normal people are - 
it can count as ham - or be ignored.




Also, what happens to mail contains lots of tokens which match neither set
(for example, perfectly legitimate email which happens to be in a language the
system hasn't been trained with)?

Mail that doesn't match either side produces no score.




Antony.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-17 Thread Marc Perkel



On 08/17/16 03:43, Matus UHLAR - fantomas wrote:

On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect 
HAM and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.


so, if mail matches both hammy and spammy tokens (or token sets), you 
don't

classify at all?



On that fingerprint is it matches both it creates no score on that item. 
The idea is to generate a lot of fingerprints so that something scores. 
If you look at enough stuff to generate hundreds of fingerprints and you 
have big reference corpi then you will usually get a result on 
something. Usually a big result in one direction.


But ignoring if it's in both makes it more immune to poisoning.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-16 Thread Marc Perkel

Hi Shawn,

What I'm doing is looking for fingerprints in email that intersect HAM 
and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers 
never say that, but normal people do. So the way to test what "spammers 
never say" is to store what they do say and see if it's NOT in the list. 
(Thus the infinite set)


Similarly, there's only so many ways to misspell viagra, and good email 
wouldn't have it spelled wrong.


Does that make sense?


On 08/16/16 12:57, Shawn Bakhtiar wrote:

Marc,

Let me first say I am truly sorry to here about your cancer. I lost my 
father to cancer just over a decade ago, after a long battle with 
sarcoma of the throat and tongue. So I pray and wish you the best.


I sent this to you in January 2016 (don't recall if I ever got a reply 
to it) but based on your document:


/Set theory is not my strongest suit,  but your diagram looks incorrect:/
/http://www.junkemailfilter.com/patent/patent5.pdf/
/
/
/Let:/
/
/
/H be ham /
/S be spam /
/E be an email/
/
/
/Than you state that:/
/HE = (H u E)/
/SE = (S u E)/
/
/
/But than the next diagram shows that there is some solution in which 
(HE u SE) and thus there may be some set which is (HE / SE). Even 
though in the first diagram S and H do not intersect./

/
/
/This is not logical. Either (H u S) in which there are tokens common 
to the ham and spam token sets, or it does not, so which is it?? in 
other words, if a token is both ham and spam how are you calculating 
it’s weight?? Is it spam or ham? /

/
/
/Clearly it’s the latter (they do not intersect) as described in this:/
/http://www.junkemailfilter.com/patent/patent2.pdf/
/
/
/In which case you are simply looking to see if (H u E) > (S u E) and 
has nothing to do with what is not in the set, and there is indeed no 
(H u S) or the negation or NOT which is (H / S), so as everyone has 
been trying to explain it has NOTHING to do with what is NOT matched./

/
/
/By they way, you can’t match an infinite set (well theoretically but 
not actually). /
/https://en.wikipedia.org/wiki/Intersection_(set_theory)/ 
<https://en.wikipedia.org/wiki/Intersection_%28set_theory%29>

/
/
/Since the current Bayes learns both SPAM and HAM I imagine that it 
does a very similar thing, other than perhaps the larger multi word 
token sets, which seems a trivial thing to add, and available in other 
tool sets. /



I'll only add this, if you believe that your SPAM has been greatly 
reduced. That's awesome! But have you really isolated it to this "new 
technique" or in playing around have you inadvertently changed 
something else that may have changed your results?


I am also not saying that you have not developed some "new technique", 
but that if you have, your description of it does not line up 
logically with the technique itself. Back in January you were looking 
to patent it, today you simply want it to live on. I suggest that if 
it is indeed the latter, than perhaps it's time to release the source 
code/scripts and let a few more eyes look at the logic to see exactly 
what is it doing, that you believe is so different than what is out there.


Again, I pray and hope the best for you,
Shawn




On Aug 16, 2016, at 6:45 AM, Marc Perkel <supp...@junkemailfilter.com 
<mailto:supp...@junkemailfilter.com>> wrote:


Thanks for the encouragement Ted. Unfortunately I know way too much 
about mathematics and I have a deep understanding of probability 
spectrums. There's a curve and I'm going to be somewhere on it. If 
I'm lucky I might be here for some time. But my life is a casino 
right now. And yes - there is also a probability spectrum for any of 
us getting hit by a bus tomorrow as well. SpamAssassin is based on 
statistical probabilities.


I have to have a dual track strategy. One one hand I need to do what 
I can to move the curve into the future. But at the same time I need 
to accomplish thing that are important within a limited time slot as 
well.


Spam filtering isn't just another job to me. I actually have a 
passion for it. On a philosophical basis I look at the internet as 
the new nervous system for humanity and is now core to who we are as 
a species. And email is a very key technology in that nervous system.


In that context spam is like poison where predators suck some of the 
life out of humanity, and my real life has always been about the 
progress of the human race.


I am somewhat of a spam fighting savant. I actually run very little 
of my email through SpamAssassin, truth be told. Over the years I've 
thrown some ideas into the mix and sometimes they have been adopted 
to make SA better. Sometimes I just get shouted down by trolls and 
the ideas go no where.


At this point however there's a deadline and I have ideas that could 
be implemen

Re: I have some bad news

2016-08-16 Thread Marc Perkel



On 08/16/16 15:22, Ted Mittelstaedt wrote:


I read though the site, and here's why I probably couldn't implement it,
at least not as it stands now.

SpamAssassin basically depends on a diet of spam to feed the learner.
The learner learns what is spam.  If you add some ham into the learner
it works better - but the main thrust of it is feed me spam feed me spam.

Your method depends on a diet of -ham- not spam because you are doing 
the opposite of SA


My problem as an admin is this.  I can guarantee that when a customer
complains about a piece of junk, that what they give me is junk.

But customers don't complain about ham.  So I'm not going to see it.
And I cannot just iterate through all my customer mailboxes and
assume they are all full of ham, because some of my customers are
lazy and won't delete spam, or they don't read their mailbox for
months at a time, etc. etc.  I cannot guarantee I'll get only ham
by doing that - and so therfore I don't have a guaranteed source
of ham.

You said that your existing perl scripts are hacks and ugly.  But,
I'm wagering that most of your ugly programming is user interface
code that somehow coaxes your users to yield up a diet of ham.

My problem is there is a tremendous dearth of user interface code
out there to get EITHER spam or ham.

The closest I have ever found is the mailwatch interface but that is
god-awful complex.  I have it running on an ISP customer of mine's
mailserver but God what a hack.

Without that, all I can do is what I do now, which is make sure that
all customers accessing my server with IMAP have a junk mail folder and
know that if they drag spam into there that I'll suck it into the
learner.  Of course, POP3 clients have nothing and I cannot tell
some POP3 user "Oh if you really want to reduce your spam load then
give up your POP3 email client and use this slick webinterface I have 
setup for you to send and receive email."


I'm actually not as interested in your engine as I am in how you get
your customers to participate with it because if you have found a
way to get 'em to do it, that is truly revolutionary.

Mine would rather bitch and moan about spam and when they get it,
just delete it - which while it puts it in a deleted folder that I
can get at (if they are IMAP) it mixes it up with deleted ham, so
I cannot take that mess of mixed unidentified spam and ham and use it 
for anything.


Ted


Hi Ted,

My system depends on a stream of both ham and spam creating a ham corpus 
and a spam corpus. I already had many rules in place (Not SA) to 
identify ham. Actually all you need is my RBL 
hostkarma.junkemailfilter.com with result 127.0.0.1 and the FcRDNS is 
good - there's your ham stream.


SA has a mindset of detecting spam. You have to change that to detecting 
spam and ham. Once you have streams going into the learner then you can 
not only increase spam detection, but you can positively identify good 
email as good and have almost no false positives. Then the output with 
strong scores are fed back into the learner where it learns how people 
who send ham speak and people who send spam speak. And it's very very 
effective. and I'm just giving it away.


Thanks for looking at it though.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-16 Thread Marc Perkel
Thanks for the encouragement Ted. Unfortunately I know way too much 
about mathematics and I have a deep understanding of probability 
spectrums. There's a curve and I'm going to be somewhere on it. If I'm 
lucky I might be here for some time. But my life is a casino right now. 
And yes - there is also a probability spectrum for any of us getting hit 
by a bus tomorrow as well. SpamAssassin is based on statistical 
probabilities.


I have to have a dual track strategy. One one hand I need to do what I 
can to move the curve into the future. But at the same time I need to 
accomplish thing that are important within a limited time slot as well.


Spam filtering isn't just another job to me. I actually have a passion 
for it. On a philosophical basis I look at the internet as the new 
nervous system for humanity and is now core to who we are as a species. 
And email is a very key technology in that nervous system.


In that context spam is like poison where predators suck some of the 
life out of humanity, and my real life has always been about the 
progress of the human race.


I am somewhat of a spam fighting savant. I actually run very little of 
my email through SpamAssassin, truth be told. Over the years I've thrown 
some ideas into the mix and sometimes they have been adopted to make SA 
better. Sometimes I just get shouted down by trolls and the ideas go no 
where.


At this point however there's a deadline and I have ideas that could be 
implemented in SA very very easily. In fact it was through SA that I 
discovered Redis, and SA already talks to redis.


Although my innovation is excellent as a programmer I'm mediocre. Never 
worked as a team. Easily frustrated. Probably somewhat autistic and 
somewhat arrogant. So mostly living in my own world doing my own 
development. I have my little online empire. I work from home. I make a 
great living. And I really like (most of) my customers and enjoy doing 
tech support. And it's allowed me a lot of free time to do things that 
I'm really interested in.


But my ideas are now my immortality, so I'm now releasing this to the 
world. And mostly this simple AI method that SA could easily implement.


This new spam filtering trick is not only extremely effective, it's 
extremely simple. I had it working in 2 days. The developers here could 
probably implement it in 1 day. (At least the core functionality) And 
with a team of better programmers probably do a better job and get a 
even better result than I get. In fact you don't need or even want my 
sloppy code (not in Perl). All you need is to read the description of 
how it works and once you get it - coding it is trivial.


So - this is an opportunity to milk the mind of the dying spam savant. 
It works, it's easy, and I'm just handing it to you all. There is no 
reason I would be making this up. All you all need to do is accept this 
gift.



On 08/16/16 01:03, Ted Mittelstaedt wrote:

Hi Marc,

  Back in 1994 I was diagnosed with testicular cancer, it was 
essentially "stage 4" as it had metastasized throughout my body.


  But, it responded to chemo and here I am today.  In fact ironically
my original oncologist died a few years ago - on a fishing trip he had
an accident and drowned.

  The Universe has an interesting sense of humor and likes to throw
curve balls.  Take what you have been told about your "probability
spectrum" and toss it in the trash - hakuna matata.   You could 
accidentally step in front of a bus tomorrow and be dead.   You could

live another 20 years.   Statistics on people only have meaning on
large groups of people - they are irrelevant when it comes to the
individual.

  I've met a number of people who had serious cancers.  And I learned
one thing from that.   The people who survived - every one of them,
fighters.  And everyone fights differently.  Some get on the food 
bandwagon and try overdosing on green tea and every alleged 
anti-cancer food out there.  Others jump into yoga, and I knew one guy 
who went out and binged watched Monty Python to spend as much time 
laughing as possible.  Me, I fought on a more mental approach.  I 
dropped everything in my life that I was not completely satisfied with 
- I turned my back on my job, my apartment, etc. - every burden or 
responsibility that I had which I didn't like and didn't really want - 
and dove into the treatment, and I never let myself believe I was in 
any danger of dying.


  Of course, not all who fight, survive.  But I will say with absolute
conviction that everyone I ever met who had a serious cancer and had
that "attitude of acceptance", later died.  You are a fighter or you
wouldn't even be here.  Now, fight to win.

Ted




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



I have some bad news

2016-08-15 Thread Marc Perkel
 the essence of who I am 
and what makes my existence have meaning will be preserved.


I have always believed that if a person decides to "own their story" and 
choose to live a life worth living that when they are faced with the end 
of their personal existence it would be much easier. And now that I am 
there I can say it is definitely true. I have not lived a perfect life 
and looking back there are quite a few things where I could have made a 
better choice. But at this point I'm feeling unusually positive about my 
situation as my last adventures unfold.


While I have spent much of my life writing software for cyberspace I 
have also written quite a bit of software for meat space. This email is 
an example of that. Meat space is coded in ideas and philosophies and 
I'm hoping in the time I have left to see what else I can accomplish. 
Facing death definitely sharpens the mind so I'm going to take advantage 
of that.


I suppose I'll wrap this up here as I can ramble on forever. And forever 
isn't as quite long as it used to be.


Marc Perkel
/root

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: ixhash.junkemailfilter.com seems to be broken currently

2016-06-21 Thread Marc Perkel



On 06/21/16 00:03, Alessio Cecchi wrote:

Il 20/06/2016 16:22, Reindl Harald ha scritto:

since Marc is present on this list and maybe others using it too:

dig A c134389d7cefd3aadce78714669239f2.ixhash.junkemailfilter.com.
status: SERVFAIL
Query time: 1798 msec

so at least for the last 2 days the rule below slows down scanning

score   JEF_IXHASH1.0
ixhashdnsbl JEF_IXHASH ixhash.junkemailfilter.com.
bodyJEF_IXHASH eval:check_ixhash('JEF_IXHASH')
describeJEF_IXHASHDIGEST: ixhash.junkemailfilter.com


Hi,

Marc, some weeks ago, confirmed to me that ixhash.junkemailfilter.com 
is no more in use.


Ciao


Yeah - I had problems keeping it stable. But - I'd still like to 
contribute. If someone wants to help me get it going again or wants a 
spam feed from me I'll set it up.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Spam Filtering Trick that could be easily adapted to Spam Assassin

2016-05-18 Thread Marc Perkel
Tried to send this into the list but I think it had so many spam phrases 
it got blocked. So I'll just link to my wiki.


http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter

This is a spam filtering trick I'm using but it's not SA, but could be 
easily adapted to SA.


Rather that just scan for regex strings it's useful to have a way to 
tell what things the message is talking about and reduce those to a 
single token that represents a concept. Then the concepts can be 
combined to produce rules or fed into Bayes for automatic scoring.




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Reporting gmail spam to Google

2016-05-17 Thread Marc Perkel

Is there any address that I can forward gmail spam to google for reporting?

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Interesting rule combo results

2016-03-09 Thread Marc Perkel



On 03/09/16 07:33, Dave Funk wrote:

On Tue, 8 Mar 2016, Marc Perkel wrote:


This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits 
that have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam 
hits that have 0 ham hits.


There are some of my personal rules mixed in.

Just posting this just to see if anyone sees any value in this.

SPAM RULES:

11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP
10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP
 5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE
 5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE

[snip..]


HAM RULES:

   132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
   132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC
   131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC

[snip..]

80056 HTML_MESSAGE
78472 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY
77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY
77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY
76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY
72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY
59189 DKIM_SIGNED
56792 DKIM_VALID

[snip..]

Marc,

Maybe I'm misunderstanding your list but it looks like you've got 
HTML_MESSAGE by itself in the HAM RULES (IE zero spam hits on 
HTML_MESSAGE)
but you've also got a rule combo of HTML_MESSAGE 
RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
as the top SPAM RULES (which implies that there is SPAM that hits 
HTML_MESSAGE too).


Similar situation for DKIM_SIGNED & DKIM_VALID

Also how can you have 132983 hits on the combo of DKIM_SIGNED 
MAILTO_LINK RDNS_DYNAMIC

but only 59189 hits on DKIM_SIGNED by itself?



That's a valid observation. In the learner I'm working on I'm 
experimenting with and interesting forgetter that wipes out and restarts 
some of the keys. Part of the process of getting rid of bad data takes 
some good data with it and usually the good data recovers over time. 
This is still very experimental. I'm just applying my new filter to just 
the rule names coming out of SA and completely ignoring the scoring or 
even if it's a spam or ham rule. I just wanted to see what the result 
would be. To see if I can generate SA rules from my data.


So far - crude at best.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Interesting rule combo results

2016-03-09 Thread Marc Perkel



On 03/09/16 06:45, RW wrote:

On Tue, 8 Mar 2016 22:25:09 -0800
Marc Perkel wrote:


This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits
that have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam
hits that have 0 ham hits.
...
...
HAM RULES:
...
   80056 HTML_MESSAGE


What's happening here? This seems to imply that  HTML_MESSAGE only
appears in ham.




I think my results are a little strange in that I might not be training 
off all the data but just that which gets past all my other filters. I'm 
still working on this but thought I'd share what it came up with for 
better or worse.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Interesting rule combo results

2016-03-08 Thread Marc Perkel
 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY
 77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY
 77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY
 76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY
 72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY
 59189 DKIM_SIGNED
 56792 DKIM_VALID
 36441 HTML_MESSAGE MAILTO_LINK
 36399 MAILTO_LINK
 34960 DKIM_VALID_AU
 32155 DKIM_SIGNED FREEMAIL_FROM_END_DIGIT MAILTO_LINK
 31739 DKIM_VALID FREEMAIL_FROM_END_DIGIT MAILTO_LINK
 31586 DKIM_SIGNED USER_IN_DEF_DKIM_WL
 31491 DKIM_VALID USER_IN_DEF_DKIM_WL
 31354 DKIM_VALID_AU USER_IN_DEF_DKIM_WL
 30286 DKIM_SIGNED FREEMAIL_ENVFROM_END_DIGIT MAILTO_LINK
 30191 DKIM_VALID FREEMAIL_ENVFROM_END_DIGIT MAILTO_LINK
 29567 HTML_MESSAGE USER_IN_DEF_DKIM_WL
 28448 USER_IN_DEF_DKIM_WL
 27835 DKIM_SIGNED DKIM_VALID_AU FREEMAIL_FROM MAILTO_LINK
 27819 DKIM_VALID DKIM_VALID_AU FREEMAIL_FROM MAILTO_LINK
 27497 DKIM_VALID_AU FREEMAIL_FROM_END_DIGIT MAILTO_LINK
 27224 DKIM_VALID_AU FREEMAIL_FROM HTML_MESSAGE MAILTO_LINK
 27135 DKIM_VALID_AU FREEMAIL_ENVFROM_END_DIGIT MAILTO_LINK
 25738 DKIM_SIGNED DKIM_VALID_AU HTML_MESSAGE LOTS_OF_MONEY
 25721 DKIM_VALID DKIM_VALID_AU HTML_MESSAGE LOTS_OF_MONEY
 24140 HTML_MESSAGE WHILE_SUPPLIES
 23120 BANG_MORE
 22958 DKIM_SIGNED WHILE_SUPPLIES
 22434 DKIM_VALID_AU FREEMAIL_FROM MIME_QP_LONG_LINE
 22406 CALL_FREE DKIM_SIGNED DKIM_VALID HTML_MESSAGE
 20571 DKIM_SIGNED HTML_MESSAGE MAILTO_LINK RDNS_DYNAMIC
 16517 CALL_FREE DKIM_SIGNED DKIM_VALID_AU HTML_MESSAGE
 16429 CALL_FREE DKIM_VALID DKIM_VALID_AU HTML_MESSAGE
 16263 DKIM_VALID_AU URI_TRY_3LD
 16036 DKIM_SIGNED DKIM_VALID USER_IN_DEF_DKIM_WL
 15975 DKIM_SIGNED DKIM_VALID_AU USER_IN_DEF_DKIM_WL
 15940 DKIM_VALID DKIM_VALID_AU USER_IN_DEF_DKIM_WL
 15036 DKIM_VALID_AU HTML_IMAGE_RATIO_02 HTML_MESSAGE MIME_HTML_ONLY
 14919 GMD_PDF_SQUARE
 14834 DKIM_SIGNED FREEMAIL_FROM FREEMAIL_FROM_END_DIGIT HTML_MESSAGE
 14745 DKIM_SIGNED DKIM_VALID FREEMAIL_FROM_END_DIGIT HTML_MESSAGE
 14661 DKIM_VALID FREEMAIL_FROM FREEMAIL_FROM_END_DIGIT HTML_MESSAGE
 14459 DKIM_SIGNED HTML_MESSAGE USER_IN_DEF_DKIM_WL
 14431 DKIM_VALID HTML_MESSAGE USER_IN_DEF_DKIM_WL
 14409 DKIM_VALID_AU HTML_MESSAGE USER_IN_DEF_DKIM_WL
 14030 DKIM_SIGNED FREEMAIL_ENVFROM_END_DIGIT 
FREEMAIL_FROM_END_DIGIT HTML_MESSAGE

 13632 DKIM_SIGNED HTML_MESSAGE MAILTO_LINK MIME_QP_LONG_LINE
 13351 HTML_MESSAGE SUBJ_2_CREDIT
 13265 SUBJ_2_CREDIT
 13163 DKIM_SIGNED SUBJ_2_CREDIT
 13055 DKIM_SIGNED DKIM_VALID MAILTO_LINK MIME_QP_LONG_LINE
 13037 DKIM_SIGNED DKIM_VALID_AU FREEMAIL_FROM_END_DIGIT HTML_MESSAGE
 13033 DKIM_VALID DKIM_VALID_AU FREEMAIL_FROM_END_DIGIT HTML_MESSAGE
 12958 DKIM_VALID_AU FREEMAIL_FROM FREEMAIL_FROM_END_DIGIT HTML_MESSAGE
 12879 DKIM_VALID HTML_MESSAGE MAILTO_LINK MIME_QP_LONG_LINE
 12514 GMD_PDF_SQUARE HTML_MESSAGE
 12238 DKIM_SIGNED HTML_MESSAGE LOTS_OF_MONEY MIME_HTML_ONLY
 12220 DKIM_SIGNED DKIM_VALID LOTS_OF_MONEY MIME_HTML_ONLY
 12071 MAILTO_LINK WHILE_SUPPLIES
 12044 DKIM_VALID HTML_MESSAGE LOTS_OF_MONEY MIME_HTML_ONLY
 11997 DKIM_VALID SUBJ_2_CREDIT
 11667 DKIM_VALID_AU SUBJ_2_CREDIT
 11360 CALL_FREE DKIM_SIGNED DKIM_VALID MAILTO_LINK
 11298 DKIM_VALID_AU FREEMAIL_FROM MAILTO_LINK
 11270 DKIM_SIGNED HTML_IMAGE_RATIO_04 HTML_MESSAGE MAILTO_LINK
 11152 DKIM_SIGNED DKIM_VALID HTML_IMAGE_RATIO_04 MAILTO_LINK
 10988 DKIM_VALID RAZOR2_CHECK SUBJ_GROUP
 10935 CALL_FREE DKIM_SIGNED HTML_MESSAGE MAILTO_LINK
 10825 MIME_HTML_ONLY USER_IN_DEF_DKIM_WL
 10809 DKIM_VALID HTML_IMAGE_RATIO_04 HTML_MESSAGE MAILTO_LINK
 10790 DKIM_SIGNED MAILTO_LINK SUBJ_ALL_CAPS
 10499 CALL_FREE DKIM_VALID HTML_MESSAGE MAILTO_LINK
 10247 DKIM_SIGNED DKIM_VALID HTML_MESSAGE SUBJ_GROUP
 10223 DKIM_VALID MAILTO_LINK SUBJ_ALL_CAPS
 10047 DKIM_SIGNED DKIM_VALID_AU HTML_IMAGE_RATIO_02 MAILTO_LINK
 10028 DKIM_VALID DKIM_VALID_AU HTML_IMAGE_RATIO_02 MAILTO_LINK

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Anyone using ASN data

2016-03-04 Thread Marc Perkel
Just wondering if anyone is using ASN information and is so - what are 
you doing?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: PDF files containing executables?

2016-03-03 Thread Marc Perkel



On 03/03/16 13:27, John Hardin wrote:

On Thu, 3 Mar 2016, Dianne Skoll wrote:


On Thu, 3 Mar 2016 13:03:44 -0800
Marc Perkel <supp...@junkemailfilter.com> wrote:


Thanks for the response. I'm in the spam filtering business and I'm
wondering what I can use (from the command line?) to detect if a PDF
has any kind of script attached that would be executable. that way I
might block based on what's embedded in a PDF.


There are tools.  Google is your friend.

However, many legitimate PDF files contain Javascript snippets. Blocking
solely on that basis will lead to many FPs.


I'd argue the "legitimate" part of that statement... :)

Sounds to me like it should be: block any PDF with 
javascript/flash/java with whitelisted bypass.


What sane MTA accepts bare executable attachments from the Internet at 
large any more? The same policy should apply to PDFs.





If I could detect java or some other executable inside a PDF then the 
message would have to be white or near white before I allowed it to pass.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: PDF files containing executables?

2016-03-03 Thread Marc Perkel



On 03/03/16 13:15, Dianne Skoll wrote:

On Thu, 3 Mar 2016 13:03:44 -0800
Marc Perkel <supp...@junkemailfilter.com> wrote:


Thanks for the response. I'm in the spam filtering business and I'm
wondering what I can use (from the command line?) to detect if a PDF
has any kind of script attached that would be executable. that way I
might block based on what's embedded in a PDF.

There are tools.  Google is your friend.

However, many legitimate PDF files contain Javascript snippets.  Blocking
solely on that basis will lead to many FPs.

Regards,

Dianne.




In that case I'd like to know if there's java in it so that if the 
message has other risk flags I can block it.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: PDF files containing executables?

2016-03-03 Thread Marc Perkel



On 03/03/16 13:02, David B Funk wrote:

On Thu, 3 Mar 2016, Marc Perkel wrote:

A customer of mine inquired about executable viruses inside of PDF 
files. Is that so? And if it is - is there any way of detecting 
executables inside of PDF?


I don't know that PDFs can contain classical ".exe" type executables 
but they
can clearly contain 'active content' (javascript, flash, etc) which 
can be

abused as a malware delivery vehicle.
So for practical purposes PDFs can be considered potential virus 
containers.


AV scanners have rules for detecting malware inside PDFs but that's 
always a catch-up game.




Hi David,

Is there a way to detect any executable code so that I can just block 
all PDF files with executables.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: PDF files containing executables?

2016-03-03 Thread Marc Perkel

Hi Kevin,

Thanks for the response. I'm in the spam filtering business and I'm 
wondering what I can use (from the command line?) to detect if a PDF has 
any kind of script attached that would be executable. that way I might 
block based on what's embedded in a PDF.


On 03/03/16 12:59, Kevin Miller wrote:

Not sure about viruses per se, but I know that there have been instances of 
embedded javascript in .pdf files which have been malicious.

Javascript can be turned off in Acrobat preferences.  Likely a toggle in other 
.pdf readers as well.

...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4588 Registered Linux User No: 307357


-Original Message-
From: Marc Perkel [mailto:supp...@junkemailfilter.com]
Sent: Thursday, March 03, 2016 11:26 AM
To: users@spamassassin.apache.org
Subject: PDF files containing executables?

A customer of mine inquired about executable viruses inside of PDF files. Is 
that so? And if it is - is there any way of detecting executables inside of PDF?

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



PDF files containing executables?

2016-03-03 Thread Marc Perkel
A customer of mine inquired about executable viruses inside of PDF 
files. Is that so? And if it is - is there any way of detecting 
executables inside of PDF?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Redis Bayes Expire

2016-03-02 Thread Marc Perkel



On 03/02/16 08:02, Axb wrote:

On 03/02/2016 04:52 PM, Marc Perkel wrote:

My Redis bayes keeps growing. It acts like it's not expiring like it
should. Do I need to do something to force expire? Also - anything ekse
I should set?

Here's my settings.

bayes_sql_dsn  server=localhost:6379
use_bayes 1
use_bayes_rules 1

# Your choice if you want to use auto_learn
bayes_auto_learn  1

use_learner 1
bayes_learn_to_journal 0

# THIS IS MANDATORY - You do NOT need to run sa-learn to expire tokens
# *_ttl below takes care of it.
bayes_auto_expire  1

# You  will need to changes this according to your need
# This replaces sa-learn's sql/file based expire routines.
bayes_token_ttl 3d
bayes_seen_ttl  1d




run
"redis-cli info" and see

"expired_keys


expired_keys:0

This doesn't look right.

db0:keys=56725213,expires=3,avg_ttl=257005915



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Redis Bayes Expire

2016-03-02 Thread Marc Perkel
My Redis bayes keeps growing. It acts like it's not expiring like it 
should. Do I need to do something to force expire? Also - anything ekse 
I should set?


Here's my settings.

bayes_sql_dsn  server=localhost:6379
use_bayes 1
use_bayes_rules 1

# Your choice if you want to use auto_learn
bayes_auto_learn  1

use_learner 1
bayes_learn_to_journal 0

# THIS IS MANDATORY - You do NOT need to run sa-learn to expire tokens
# *_ttl below takes care of it.
bayes_auto_expire  1

# You  will need to changes this according to your need
# This replaces sa-learn's sql/file based expire routines.
bayes_token_ttl 3d
bayes_seen_ttl  1d


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Error when trying to re-use Bayes database from one server to another

2016-02-13 Thread Marc Perkel


On 02/13/16 00:42, Reindl Harald wrote:



Am 13.02.2016 um 02:56 schrieb Marc Perkel:

For what it's worth - just used Redis. Redis is the only thing that's
worked reliably for me


you can't use Redis when it comes to different servers in different 
networks for different clients


BDB works fine and relieable, at least without autolearning and 
autoexpire and having the bayes-db path read-only for the running 
spamd with namespaces


0  60388SPAM
0  21651HAM
02510401TOKEN

insgesamt 73M
-rw--- 1 sa-milt sa-milt 10M 2016-02-13 09:12 bayes_seen
-rw--- 1 sa-milt sa-milt 81M 2016-02-13 09:12 bayes_toks



I'm filtering 5000 domains using a single redis server and 4 SA servers.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Error when trying to re-use Bayes database from one server to another

2016-02-13 Thread Marc Perkel


On 02/13/16 07:25, Reindl Harald wrote:



Am 13.02.2016 um 15:59 schrieb Marc Perkel:

On 02/13/16 00:42, Reindl Harald wrote:


Am 13.02.2016 um 02:56 schrieb Marc Perkel:

For what it's worth - just used Redis. Redis is the only thing that's
worked reliably for me


you can't use Redis when it comes to different servers in different
networks for different clients

BDB works fine and relieable, at least without autolearning and
autoexpire and having the bayes-db path read-only for the running
spamd with namespaces

0  60388SPAM
0  21651HAM
02510401TOKEN

insgesamt 73M
-rw--- 1 sa-milt sa-milt 10M 2016-02-13 09:12 bayes_seen
-rw--- 1 sa-milt sa-milt 81M 2016-02-13 09:12 bayes_toks



I'm filtering 5000 domains using a single redis server and 4 SA servers


looks like you refused to understand 'different networks'

it's fine in your infrastructure but it won't work in the cases we 
have in real life where another company with independent 
infrastructure fetchs our bayes in context of a subscription over 
webservices, move the files in a temp-folder and train own samples 
before replace the local bayes with the result


reason?

2510401 tokens with dump in and dump out is horrible slow when the 
number of local samples is around 1000 messages versus 82 messages 
we have feeded since 2014


or would you open your redis server for 3rd parties on the WAN?



I'm using SSH tunneling to keep my redis private.

Maybe I'm not understanding what you are trying to do?

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: URIBL/DNSBL from a database

2016-02-12 Thread Marc Perkel


On 02/12/16 05:39, Alex wrote:

Hi,

For some time now I've been cycling URLs and IPs through  a mariadb
database gathered from incoming mail on a honeypot I've created.
Surprising how many are received ahead of spamhaus/barracuda.

I'm looking for ideas on how to now make this information available to
spamassassin on my production system. I'd like to somehow export the
IPs, any URLs in the body, and email addresses to spamassassin.

Is it possible for spamassassin to query a database directly?

I'm familiar with how to create a uridnsbl, but is DNS the best
approach here? The info needs to be updated and reloaded rapidly, and
not all the info (URLs, emails) are conducive to being in DNS.

Is anyone else doing this, and are you just rejecting the IPs at the
SMTP level outright?

Thanks,
Alex




Yeah - unless you write your own SA module using DNS is the quick easy 
solution.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Marc Perkel
Any chance that the parent directory structure doesn't have enough 
permissions?


The error message says it can't access it so there's your clue. Since 
the files themselves seem to have good permissions I would look at the 
directories.


On 02/12/16 08:29, Sebastian Arcus wrote:
As per advice from this list, I have been re-using my bayes databases 
on several different servers running SA. On one of the servers though, 
the database is not accepted. I re-transferred them several times over 
ssh, to make sure they were not corrupted. The database files are in 
the correct location, with correct permissions and owned by the 
correct user:


# ls -l /var/spool/spamd/bayes/
total 5912
-rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen
-rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks

The version of SA on both the donor and receiving servers is 3.4.1.

When I try to learn a new message on the receiving server (where I 
moved the bayes files), I get the following error:


# su - spamd -c "/usr/bin/sa-learn -D --spam /New\ 
UnansweredSexHookup\ Request.eml"




Feb 12 16:20:53.777 [12973] dbg: locker: mode is 438
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: created 
/var/spool/spamd/bayes/bayes.lock.mdr-server.mdrinteriors.co.uk.12973
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: trying to get lock 
on /var/spool/spamd/bayes/bayes with 0 retries
Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: link to 
/var/spool/spamd/bayes/bayes.lock: link ok
Feb 12 16:20:53.778 [12973] dbg: bayes: tie-ing to DB file R/W 
/var/spool/spamd/bayes/bayes_toks

Feb 12 16:20:53.779 [12973] dbg: bayes: untie-ing DB file toks
Feb 12 16:20:53.779 [12973] dbg: locker: safe_unlock: unlink 
/var/spool/spamd/bayes/bayes.lock
bayes: cannot open bayes databases /var/spool/spamd/bayes/bayes_* R/W: 
tie failed: No such file or directory

Learned tokens from 0 message(s) (1 message(s) examined)
Feb 12 16:20:53.779 [12973] dbg: plugin: 
Mail::SpamAssassin::Plugin::Bayes=HASH(0x93106d0) implements 
'learner_close', priority 0
ERROR: the Bayes learn function returned an error, please re-run with 
-D for more information at /usr/bin/sa-learn line 498.







--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Error when trying to re-use Bayes database from one server to another

2016-02-12 Thread Marc Perkel
For what it's worth - just used Redis. Redis is the only thing that's 
worked reliably for me.




Question about spam report header

2016-02-02 Thread Marc Perkel
Normally SA creates a header that has a list of the names of rules that 
matched. It skips the listing of hidden rules that start with __ .


Is there a command where I can easily tell SA to include the hidden 
rules in the report in the headers so I can see all of it?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Question about spam report header

2016-02-02 Thread Marc Perkel

perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/*

This converts the rules. I'm doing something very interesting. It's 
going to take a few days to see if it works.


I'm applying the same techniques of my evolution filter to the SA rule 
names.


I extract the names and then run them into a program that create all 
combinations up to 4 levels and learn those combos as either spam or ham.


Then after building a ham and spam corpus sets I take the test message - 
create  set of rule combinations and then do set campares against the to 
ham and spam sets.


What I'm looking for is combos matching ham and NOT matching spam - or - 
combinations matching spam and NOT matching ham.


In theory I should be able to create thousands of combination rules for 
both ham and spam that all have a very high probably of being accurate. 
It's just an interesting experiment to see how well it works.


Right now I have 151728 ham combination, 113632 spam combinations. Of 
those only 22933 are in both sets. It's only been learning for one day. 
I want to see where it is after a week.


Buy changing the rules from __ to T_ I exposed a lot more rule names. 
The way this works is that I don't need to know what rules are ham rules 
or spam rules in advance. And I don't need to score them. The filter 
figures it all out on it's own. So the rule names are just information.


I think this trick will make SA far more accurate. We'll see. I want to 
give it till at least Friday for the system to learn. I'm also storing 
hit counts so that I could pick out maybe the best 1000 rules and 
publish them.


Anyhow - that's what I'm up to and so far results are good. But because 
it's early in the learning cycle most message are not yet producing 
significant scores. The ones that are producing scores are making the 
right call however.




On 02/02/16 20:19, Dave Funk wrote:
You can do that but it requires editing all your rule files, altho 
then you see those matches in all your reports.


If you just want to test one particular message, just use the -D 
option to spamassassin and grep for ' got hit: '


Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION 
==> got hit: ""
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule 
__TO_HEADER_EXISTS ==> got hit: "<"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS 
==> got hit: ""
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 
==> got hit: "negative match"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 
==> got hit: "negative match"
Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM 
==> got hit: ""


(Yes, Marc, you probably already know this, this is for the other 
people who might be following this thread ;)


On Tue, 2 Feb 2016, Marc Perkel wrote:


Never mind 

I found that if I change __ to T_ that it does what I want.


On 02/02/16 18:05, Marc Perkel wrote:


On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rules 
that matched. It skips the listing of hidden rules that start with 
__ .


Is there a command where I can easily tell SA to include the hidden 
rules in the report in the headers so I can see all of it?




I'm also - I suppose asking it to list rules that match that produce 
no scores.


body  __LATE_RICH_RELATIVE /\blate 
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i


body  __CT_CLICK   /\b(click(ing)? 
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i


body  __BENEFICIARY/\bbeneficiary\b/i

body  __CT_BEGGER  /\b(kind assist[ae]nce|feed my 
family|need (of )?your help|donat(e|ion))\b/i


body  __CT_CONTACT /\b((contact(?:ing) you|contact 
(information|me|email|number|us)|your contact))|to (inform|email) you/i


body  __CT_REPLY_TO_ME /\b(reply to me|please reply|my 
email address|private email|contact me|prompt response|reply from 
you|hearing from you|assist me)/i


body  __CT_DYING   /\b(diagnosed with|months to 
live|dying of|transplant)\b/i


body  __CT_UNITED_NATIONS  /\bUnited Nations?\b/i

meta  __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND 
|| CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE


meta  __CT_MONEY   CT_TRANSFER_MONEY || 
CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || 
FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || 
US_DOLLARS_2 || ADVA$


meta  __CT_VICTIM  __BENEFICIARY || 
CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING


meta  __CT_FORMFILL_THIS_FORM || 
FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT


meta  __CT_CONFIDENTIALCT_PRIVATE_EMAIL || 
CT_PRIVATE_PHONE || CONFIDE

Re: Question about spam report header

2016-02-02 Thread Marc Perkel


On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rules 
that matched. It skips the listing of hidden rules that start with __ .


Is there a command where I can easily tell SA to include the hidden 
rules in the report in the headers so I can see all of it?




I'm also - I suppose asking it to list rules that match that produce no 
scores.


body  __LATE_RICH_RELATIVE /\blate 
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i


body  __CT_CLICK   /\b(click(ing)? 
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i


body  __BENEFICIARY/\bbeneficiary\b/i

body  __CT_BEGGER  /\b(kind assist[ae]nce|feed my 
family|need (of )?your help|donat(e|ion))\b/i


body  __CT_CONTACT /\b((contact(?:ing) you|contact 
(information|me|email|number|us)|your contact))|to (inform|email) you/i


body  __CT_REPLY_TO_ME /\b(reply to me|please reply|my email 
address|private email|contact me|prompt response|reply from you|hearing 
from you|assist me)/i


body  __CT_DYING   /\b(diagnosed with|months to 
live|dying of|transplant)\b/i


body  __CT_UNITED_NATIONS  /\bUnited Nations?\b/i

meta  __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND || 
CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE


meta  __CT_MONEY   CT_TRANSFER_MONEY || CT_THE_SUM_OF || 
CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$


meta  __CT_VICTIM  __BENEFICIARY || CT_LATE_PRESIDENT || 
CT_LATE_RICH_RELATIVE || __CT_DYING


meta  __CT_FORMFILL_THIS_FORM || FILL_THIS_FORM_LONG 
|| T_FILL_THIS_FORM_SHORT


meta  __CT_CONFIDENTIALCT_PRIVATE_EMAIL || CT_PRIVATE_PHONE 
|| CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2


meta  __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY || 
CT_URGENT_RESPOND


meta  CT_GOD_BENEFICIARY   __CT_GOD && __CT_VICTIM
describe  CT_GOD_BENEFICIARY   God and Beneficiary
score CT_GOD_BENEFICIARY   4

meta  CT_GOD_BEGGER__CT_GOD && __CT_BEGGER
describe  CT_GOD_BEGGERBegging in Religious Language
score CT_GOD_BEGGER    3


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Question about spam report header

2016-02-02 Thread Marc Perkel

Never mind 

I found that if I change __ to T_ that it does what I want.


On 02/02/16 18:05, Marc Perkel wrote:


On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rules 
that matched. It skips the listing of hidden rules that start with __ .


Is there a command where I can easily tell SA to include the hidden 
rules in the report in the headers so I can see all of it?




I'm also - I suppose asking it to list rules that match that produce 
no scores.


body  __LATE_RICH_RELATIVE /\blate 
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i


body  __CT_CLICK   /\b(click(ing)? 
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i


body  __BENEFICIARY/\bbeneficiary\b/i

body  __CT_BEGGER  /\b(kind assist[ae]nce|feed my 
family|need (of )?your help|donat(e|ion))\b/i


body  __CT_CONTACT /\b((contact(?:ing) you|contact 
(information|me|email|number|us)|your contact))|to (inform|email) you/i


body  __CT_REPLY_TO_ME /\b(reply to me|please reply|my 
email address|private email|contact me|prompt response|reply from 
you|hearing from you|assist me)/i


body  __CT_DYING   /\b(diagnosed with|months to 
live|dying of|transplant)\b/i


body  __CT_UNITED_NATIONS  /\bUnited Nations?\b/i

meta  __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND || 
CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE


meta  __CT_MONEY   CT_TRANSFER_MONEY || CT_THE_SUM_OF 
|| CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$


meta  __CT_VICTIM  __BENEFICIARY || CT_LATE_PRESIDENT 
|| CT_LATE_RICH_RELATIVE || __CT_DYING


meta  __CT_FORMFILL_THIS_FORM || 
FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT


meta  __CT_CONFIDENTIALCT_PRIVATE_EMAIL || 
CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2


meta  __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY || 
CT_URGENT_RESPOND


meta  CT_GOD_BENEFICIARY   __CT_GOD && __CT_VICTIM
describe  CT_GOD_BENEFICIARY   God and Beneficiary
score CT_GOD_BENEFICIARY   4

meta  CT_GOD_BEGGER__CT_GOD && __CT_BEGGER
describe  CT_GOD_BEGGERBegging in Religious Language
score CT_GOD_BEGGER    3




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Can your bayes do this?

2016-01-20 Thread Marc Perkel

OK - Just to show you this isn't Bayesian - see if you can do this.

Here is a list of 5505874 words and phrases used in the subject line of 
HAM and never seen in the subject line of SPAM


http://www.junkemailfilter.com/data/subject-ham.txt

Here is a list of 3494938 words and phrases used in the subject line of 
SPAM and never seen in the subject line of HAM


http://www.junkemailfilter.com/data/subject-spam.txt

Hope you understand it now. Not Bayesian

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
Yes - you missed something. It is about intersecting one corpi and NOT 
intersecting the other.


This is about what doesn't match - not what does.

On 01/20/16 10:26, Shawn Bakhtiar wrote:

Sorry.. how is this different than Naive Bayes filtering??

"Naive Bayes classifiers work by correlating the use of tokens 
(typically words, or sometimes other things), with spam and non-spam 
e-mails and then using Bayes' theorem to calculate a probability that 
an email is or is not spam."

— https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

"the set of fingerprints of the test message is intersected with the 
spam and ham corpi creating sub sets of matches. Then you do a set 
diff both ways (ham - spam) (spam - ham) and whichever side is bigger 
wins. Generally it will match on only one side or very predominately 
on one side.” — Marc Perkel


You are still looking up words/phrases in a dictionary set, and coming 
up with a probability factor of which side it falls on (an application 
of Baye’s theorom).


Or did I miss something?



On Jan 20, 2016, at 9:17 AM, Wrolf <wr...@wrolf.net 
<mailto:wr...@wrolf.net>> wrote:


Good luck with your patent application, it should be in the 
infinitely elastic queue right after my perpetual motion machine.


Not sure how you will deal with the number of ham tokens in spam 
messages. Also not sure how much ham will get canned as spam - but 
then, maybe people shouldn't be sending each other poetry?


haiku by email
blossoms in my inbox
drink morning coffee


;-)


Wrolf
wr...@wrolf.net <mailto:wr...@wrolf.net>

On Wed, Jan 20, 2016 at 11:52 AM, Marc Perkel 
<supp...@junkemailfilter.com <mailto:supp...@junkemailfilter.com>> wrote:


OK - following up on this. I have my provisional patent filed.
I'm still doing development to improve it and working on a
licensing contract. But the license will be based on the Creative
Commons patent with some restrictions added. Basically I want to
get a license fee from the big guys and my spam filtering
competitors. So unless you are in the spam filtering business or
have more than 10,000 email addresses it's not going to cost you
anything.

I'm going to describe the concept here. I'm not going to share my
code because my code is specific to my system and it a
combination of bash scripts, redis, pascal, php, and Exim rules.
And the open source programmers are likely to implement it better
than I have. Basically I'm trying not to put myself out of
business and this new method is a bigger breakthrough than
Bayesian filtering.

Maybe I should call it a new plan for spam?

So - I'm just going to introduce the concept right now about how
it works. Once you know what I'm doing it should be easy to
implement, I had it working in a couple of days and I'm not an
outstanding programmer. One thing to keep in mind is this is a
paradigm shift. It's not about matching - *it's about NOT
matching*. And although it is far better at catching spam, it
best feature is actively identifying good email.

The secret sauce

Suppose I get an email with the subject line "Let's get some
lunch". I know it's a good email because spammers never say
"Let's go to lunch". In fact there are an infinite number of
words and phrases that are used in good email that are never ever
used in spam. And if I'm using words and phrases *never used in
spam* that are used in ham - it's good email. And similarly - if
I'm using words and phrases that are used in spam and *never used
in spam* - it's spam.

So - how do I get a list of words and phrases never used in spam?
I create a list of words and phrases that are used in spam and
check to see if it's *not on the list*.

What I do is tokenize the spamiest parts of the email, like the
subject line, into words and phrases of 1 2 3 and 4 word phrases.

the quick brown fox jumps over the lazy dog - becomes

"the" "quick" "the quick" "brown" "quick brown" "the quick brown"
"fox" "brown fox" "quick brown fox" "the quick brown fox" "jumps"
"fox jumps" "brown fox jumps" "quick brown fox jumps" "over"
"jumps over" "fox jumps over" "brown fox jumps over" "the" "over
the" "jumps over the" "fox jumps over the" "lazy" "the lazy"
"over the lazy" "jumps over the lazy" "dog" "lazy dog" "the lazy
dog" "over the lazy dog"

These tokens are learned as ham or spam and added to sets. I'm
using Redis to do this because it has extremely fast set
operations. I don't know of anything other than Redis that can d

My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
OK - following up on this. I have my provisional patent filed. I'm still 
doing development to improve it and working on a licensing contract. But 
the license will be based on the Creative Commons patent with some 
restrictions added. Basically I want to get a license fee from the big 
guys and my spam filtering competitors. So unless you are in the spam 
filtering business or have more than 10,000 email addresses it's not 
going to cost you anything.


I'm going to describe the concept here. I'm not going to share my code 
because my code is specific to my system and it a combination of bash 
scripts, redis, pascal, php, and Exim rules. And the open source 
programmers are likely to implement it better than I have. Basically I'm 
trying not to put myself out of business and this new method is a bigger 
breakthrough than Bayesian filtering.


Maybe I should call it a new plan for spam?

So - I'm just going to introduce the concept right now about how it 
works. Once you know what I'm doing it should be easy to implement, I 
had it working in a couple of days and I'm not an outstanding 
programmer. One thing to keep in mind is this is a paradigm shift. It's 
not about matching - *it's about NOT matching*. And although it is far 
better at catching spam, it best feature is actively identifying good email.


The secret sauce

Suppose I get an email with the subject line "Let's get some lunch". I 
know it's a good email because spammers never say "Let's go to lunch". 
In fact there are an infinite number of words and phrases that are used 
in good email that are never ever used in spam. And if I'm using words 
and phrases *never used in spam* that are used in ham - it's good email. 
And similarly - if I'm using words and phrases that are used in spam and 
*never used in spam* - it's spam.


So - how do I get a list of words and phrases never used in spam? I 
create a list of words and phrases that are used in spam and check to 
see if it's *not on the list*.


What I do is tokenize the spamiest parts of the email, like the subject 
line, into words and phrases of 1 2 3 and 4 word phrases.


the quick brown fox jumps over the lazy dog - becomes

"the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox" 
"brown fox" "quick brown fox" "the quick brown fox" "jumps" "fox jumps" 
"brown fox jumps" "quick brown fox jumps" "over" "jumps over" "fox jumps 
over" "brown fox jumps over" "the" "over the" "jumps over the" "fox 
jumps over the" "lazy" "the lazy" "over the lazy" "jumps over the lazy" 
"dog" "lazy dog" "the lazy dog" "over the lazy dog"


These tokens are learned as ham or spam and added to sets. I'm using 
Redis to do this because it has extremely fast set operations. I don't 
know of anything other than Redis that can do this. So think about Redis 
as the way to implement this.


A new message comes in. It is tokenized and fingerprinted and hundreds 
of fingerprints are generated. Then it's all set operations. the set of 
fingerprints of the test message is intersected with the spam and ham 
corpi creating sub sets of matches. Then you do a set diff both ways 
(ham - spam) (spam - ham) and whichever side is bigger wins. Generally 
it will match on only one side or very predominately on one side.


So I'm not just tokenizing the subject. Also the first 25 words of the 
message, the text of links in the message, The name part of the from 
address, The header names, the attachment names, the PHP script if there 
is one, and various behavior characteristics, (slow, no quit, no RDNS, 
number on mime parts, multiple recipients, etc.)


SpamAssassin is all about matching rules. This is all about not 
matching. Not matching allows you to compare to an infinite set rather 
than a finite set. So when spammers start misspelling words to not match 
the rules, my system catches that and makes its own rules. The tricks 
that spammers use not makes it easier to catch them using this method.



I will post a link to a better explanation later when I write one. But 
wanted to let you all know this wasn't just a tease from some crazy person.


So - here's what I want to see happen.

I'd like to see SA implement this. I will provide a license to include 
with it giving most people a free license. sort of like how Spamhaus 
isn't free to everyone, but it's in SA. Then the new method will take 
off and eventually I'll get a little something for this.


This new method (I'm calling it the Evolution Spam Filter because the 
algorithm mimics evolution.) it doesn't just block spammers, it 
decimates spammers. It's not just a treatment - it's the cure. I hate 
spam and although I could have kept this secret and made money having 
the best spam filter on the planet, I decided I had a moral obligation 
to make this generally available. I think this will save the global 
economy billions of dollars in recovered productivity and crime and 
fraud prevention.


I'm seeing close to 100% accuracy. It is so accurate it's scary 

Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel


On 01/20/16 10:36, John Hardin wrote:

On Wed, 20 Jan 2016, Marc Perkel wrote: .

So it still needs to be trained, at least initially, with a 
manually-vetted corpus. If not, how do you propose to do the initial 
classification of messages for training?


Do you envision it being self-training past that point? What if it 
goes off the rails? How would you keep it from going off the rails?


If it's not self-training then you have the same issues with the 
reliability of the people feeding the training corpus.


On my system I have a long list of good email sources that are 100% 
white listed and I also have hackerbot traps that are 100% spam. I use 
these for training to keep it on the rails. Good question though.




So I'm not just tokenizing the subject. Also the first 25 words of 
the message


OK, good. I was thinking it would be *really* sensitive to "bayes 
poisoning". Ignoring all but the first part of the body helps.


I assume you're only considering the portion that would render as 
visible to the recipient. Of course, that brings in all the logic 
regarding "what is visible to the recipient?" and all the HTML 
obfuscation we're already seeing to get around Bayes and "only scan 
the first part of the message".




Actually it's very insensitive to poisoning. Yes a spammer might cancel 
out some good phrases every now and then but since my system does NOT 
matching on one side it's not as sensitive as Bayes. If they poison with 
the same phrases twice I have them.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



More details on my evolution filter patent

2016-01-20 Thread Marc Perkel
Here's the details of how the filtering system is structured. This is 
what I filed:


http://www.junkemailfilter.com/patent/patent.pdf

Drawings with it.

http://www.junkemailfilter.com/patent/patent1.pdf
http://www.junkemailfilter.com/patent/patent2.pdf
http://www.junkemailfilter.com/patent/patent3.pdf
http://www.junkemailfilter.com/patent/patent4.pdf
http://www.junkemailfilter.com/patent/patent5.pdf

This will be my licensing template:

https://wiki.creativecommons.org/wiki/Model_Patent_License

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel


On 01/20/16 11:25, John Hardin wrote:

On Wed, 20 Jan 2016, Marc Perkel wrote:

On 01/20/16 10:44, Antony Stone wrote:


 How do you identify "the spammiest parts" of an email?


The Subject line - the first few words of the email. the header 
structure, behavior. File extensions of attached files.


Are you getting .zip/.rar/etc archive directory listings for that, 
too? I recommend you do so, that would help trap malware.




I throw the extensions into the mix as "information". The filter makes 
the associations. If there are file extensions that spammers never use 
then the message is ham.




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: More details on my evolution filter patent

2016-01-20 Thread Marc Perkel


On 01/20/16 11:45, Axb wrote:

On 01/20/2016 08:24 PM, Marc Perkel wrote:

Here's the details of how the filtering system is structured. This is
what I filed:

http://www.junkemailfilter.com/patent/patent.pdf

Drawings with it.

http://www.junkemailfilter.com/patent/patent1.pdf
http://www.junkemailfilter.com/patent/patent2.pdf
http://www.junkemailfilter.com/patent/patent3.pdf
http://www.junkemailfilter.com/patent/patent4.pdf
http://www.junkemailfilter.com/patent/patent5.pdf

This will be my licensing template:

https://wiki.creativecommons.org/wiki/Model_Patent_License



And what does all this FUSSP have to do with SA?


It could probably be implemented in SA with little effort.









--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



My new method for blocking spam - example

2016-01-20 Thread Marc Perkel
Let me give you an example. Here's 2 subject lines. Easy to guess which 
one is spam.


"Meet horny Russian Brides online!"
"I read an article about Russian brides in a magazine."

Bayes or spam assassin would look at "Russian Brides" and 499 out of 500 
times it's spam. Therefore the nonspam version scores spam points.


In my system "Russian brides" is neutral because it is used in both spam 
and ham. But on the spam side, phrases used in other spam *not matched* 
in ham.


Meet horny
horny Russian
horny Russian brides
brides online!
online!

On the ham side, phrases used in ham *not matched* in spam.

I read an article
read an article
an article about
brides in a magazine
in a magazine

My filter gets both correctly because of NOT matching. Not matching is a 
comparison to an infinite set.




Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel


On 01/20/16 10:44, Antony Stone wrote:

On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote:


Suppose I get an email with the subject line "Let's get some lunch". I
know it's a good email because spammers never say "Let's go to lunch".
In fact there are an infinite number of words and phrases that are used
in good email that are never ever used in spam.

Surely this is going to change as soon as enough people implement your
filtering system - spammers will use legitimate phrases from ham, both in the
subject line and the body of their emails, and thereby get classified as ham?
Matching ham doesn't get you classified as ham. it's in not matching 
spam. Matching ham is neutral if spammers use it too.


At some point the spammer wants you to do something and if they immitate 
ham perfectly then they don't have a message and it's no longer spam. 
(Except I tokenize behavior as well)



So, you're identifying ham by checking that it does not contain words or
phrases which you have previously seen in spam...

Sounds very much like Bayes to me.


Bayes compares the new email to whats inside the ham and spam boxes. 
What I do is compare inside on one side and outside the box on the 
other. Bayes is about matching The Evolution filter is about NOT matching.





What I do is tokenize the spamiest parts of the email, like the subject
line

How do you identify "the spammiest parts" of an email?


The Subject line - the first few words of the email. the header 
structure, behavior. File extensions of attached files. Name part of 
from address, text inside links.





I'd like to see SA implement this.
I'm not going to share my code because my code is specific to my system and
it a combination of bash scripts, redis, pascal, php, and Exim rules. And
the open source programmers are likely to implement it better than I have.

Given that you have *some* source code, no matter how bad / buggy / specific it
is, I think you'll get much greater take-up (and also comprehension of exactly
what your technique is) if you at least publish that and invite people to
improve on it, rather than say "here's a method idea - you guys code it".


The heart of the code is what I do with Redis. It's just set operations.

Intersect Ham diff Spam to get ham matches.
Intersect Spam diff Ham to get spam matches.

Count the lines - Subtract the result - and you have a score.





I'm seeing close to 100% accuracy.

1. How close?

Less than 10 a day filtering 5000 domains.



2. On what volume of email?


1.3 million good emails last week.



3. What proportion of spam / ham?


About 10 spam to one ham. But I have a spam baiting system so I get more 
spam than normal.




4. What % false positives / negatives?

Especially good at identifying ham.



5. How many different domains' email are you feeding in to it?

6. How long have you been testing it (ie: how much have you seen of how it
adapts to new spam patterns)?


About 4 weeks now.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel


On 01/20/16 11:32, Reindl Harald wrote:



Am 20.01.2016 um 20:27 schrieb Marc Perkel:


On 01/20/16 11:25, John Hardin wrote:

On Wed, 20 Jan 2016, Marc Perkel wrote:

On 01/20/16 10:44, Antony Stone wrote:


 How do you identify "the spammiest parts" of an email?


The Subject line - the first few words of the email. the header
structure, behavior. File extensions of attached files.


Are you getting .zip/.rar/etc archive directory listings for that,
too? I recommend you do so, that would help trap malware.



I throw the extensions into the mix as "information". The filter makes
the associations. If there are file extensions that spammers never use
then the message is ham.


don't get me wrong but "file extensions that spammers never use" does 
not happen given that useable file-extensions are limited


easily to trick out by use attachments with random extensions not part 
of the payload


all what i read here is "i have re-invited bayes" but call it different



Bayes is about matching. My Evolution filter is about NOT matching. It's 
the*NOT matching* that makes it different.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



The difference between my Evolution filter and Bayes is ...

2016-01-20 Thread Marc Perkel
Bayes compares the test message to what's in the Ham corpus and what's 
in the Spam corpus and comes up with a number indicating it's more like 
one or the other.


Evolution matched the Ham corpus and not matches the spam corpus to get 
a ham score. Then it matches the spam corpus and not matches the ham to 
get a spam score. Usually the results are all on one side or the other, 
or at least very predominately one side.


Sometime though I get no result. Either it matches both or neither. But 
I also look at hearers and behavior and that usually gives results too. 
I'm tracking 8 attributes now.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - example

2016-01-20 Thread Marc Perkel


On 01/20/16 11:50, Reindl Harald wrote:



Am 20.01.2016 um 20:46 schrieb Marc Perkel:

Let me give you an example. Here's 2 subject lines. Easy to guess which
one is spam.

"Meet horny Russian Brides online!"
"I read an article about Russian brides in a magazine."

Bayes or spam assassin would look at "Russian Brides" and 499 out of 500
times it's spam. Therefore the nonspam version scores spam points.

In my system "Russian brides" is neutral because it is used in both spam
and ham. But on the spam side, phrases used in other spam *not matched*
in ham


that is *exactly* how bayes works and subject alone is *not* they key

tokenizing the *whole* message with enough spam *and* ham samples is 
the key - so there are two options:


* you re-invited bayes with a different name
* you modified bayes with some tricks and hope
  spammers would not adopt them

anyways, i doubt there is a sane reason for a patent because the 
principles are just prior art -> bayes





Again - Bayes compares what matches. My filter compares what doesn't match.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel


On 01/20/16 12:05, RW wrote:

On 01/20/16 10:26, Shawn Bakhtiar wrote:

Sorry.. how is this different than Naive Bayes filtering??

On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:


Yes - you missed something. It is about intersecting one corpi and
NOT intersecting the other.

This is about what doesn't match - not what does.


What you are doing is a special case of an ordinary Bayesian filter. If
you remove Robinson's correction for low-count tokens, or adjust the
Robinson parameters so it has no effect, you end up with tokens that
only occur in spam having a probability of 1, tokens that only occur
in ham having a probability of 0 and token that occur in both having a
probability in-between. If set a cut-off of 0.49... you leave
only the pure tokens behind. And because all the probabilities are 0 or
1 the chi-squared test reduces to comparing the number of spammy and
hammy tokens just as you are doing.

Your multi-word tokenization is exactly the same as in Bogofilter and
most of what you are doing can be done in Bogofilter with a few lines
in the configuration file.

Any value in your scheme must be in the selection of what you
tokenize. The rest is likely holding it back.





Again - it's not about matching as Bayes does. It's about not matching.

In the subject line of the message the phrase "method for blocking spam" 
makes the message ham. Spammers never use the phrase "method for 
blocking spam". No other tests needed. My system result 100% ham. To 
bayes it's just some words.


What makes it ham is what doesn't match, not what does.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel


On 01/20/16 12:14, Reindl Harald wrote:



Am 20.01.2016 um 21:11 schrieb Marc Perkel:


On 01/20/16 12:05, RW wrote:

On 01/20/16 10:26, Shawn Bakhtiar wrote:

Sorry.. how is this different than Naive Bayes filtering??

On Wed, 20 Jan 2016 10:52:58 -0800
Marc Perkel wrote:


Yes - you missed something. It is about intersecting one corpi and
NOT intersecting the other.

This is about what doesn't match - not what does.


What you are doing is a special case of an ordinary Bayesian filter. If
you remove Robinson's correction for low-count tokens, or adjust the
Robinson parameters so it has no effect, you end up with tokens that
only occur in spam having a probability of 1, tokens that only occur
in ham having a probability of 0 and token that occur in both having a
probability in-between. If set a cut-off of 0.49... you leave
only the pure tokens behind. And because all the probabilities are 0 or
1 the chi-squared test reduces to comparing the number of spammy and
hammy tokens just as you are doing.

Your multi-word tokenization is exactly the same as in Bogofilter and
most of what you are doing can be done in Bogofilter with a few lines
in the configuration file.

Any value in your scheme must be in the selection of what you
tokenize. The rest is likely holding it back.



Again - it's not about matching as Bayes does. It's about not matching.

In the subject line of the message the phrase "method for blocking spam"
makes the message ham. Spammers never use the phrase "method for
blocking spam". No other tests needed. My system result 100% ham. To
bayes it's just some words

What makes it ham is what doesn't match, not what does


"Spammers never use the phrase" is pure bullshit - sorry, no way to 
express it nicer!




The way I know what spammers never use is I store what spammers do use 
and see if it doesn't match. I've processed more that 100 million spams 
and it's amazing how many common words and phrases that spammers never use.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Marc Perkel
It could be challenging if someone impersonated a bank and they did it 
right. I'm looking at more aspects than just the content of the message 
but that's an area where there is some possible weakness. There are 
other tricks to address the specifically. And I am looking at behavior 
and headers as well.


To be a little clearer. This new system isn't perfect. And it's main 
strength is identifying good email. It does catch a lot more spam for 
sure but when people scream at me it's because I blocked something 
important. So think of this more as detecting ham as it's big feature.


On 01/20/16 14:37, jdow wrote:
And just how well does this work against spearfishing? And would the 
same magic list work for ma and pa Kettle well into their 80s only 
receiving emails from their children and Freddie Burfle with his heads 
buried in a corporate accounts payable office?


{^_^}




Re: Another way to use my filter in SA

2016-01-20 Thread Marc Perkel


On 01/20/16 14:28, John Hardin wrote:

On Wed, 20 Jan 2016, Marc Perkel wrote:


Here's another way to use my evolution filtering idea with SA.

Get rid of all the rule scores and just make a list of the rule 
names. From the rule names generate all combinations of those rule 
names up to 4 rule names in a fingerprint and learn those 
fingerprints as either ham or spam. sort of like this:


“A” “AB” “B” “C” “AC” “ABC” “BC” “D” “AD” “ABD” “BD” “CD” “ACD” 
“ABCD” “BCD” “E” “AE” “BE” “CE” “ACE” “BCE” “DE” “ADE” “ABDE” “BDE” 
“CDE” “ACDE” “ABCDE” “BCDE”


Then - when a new message comes in you make the same combo of 
fingerprints from the rule names and then use my formula.


card(Test intersect Spam diff Ham) - card(Test Intersect Ham diff Spam)

Positive result = spam
Negative result = ham


Unfortunately this also requires training. It would render SA a 
product that does not work out-of-the-box.




Actually it could include a pretrained corpus on the rules at least to 
get people started. Could also have someone (like me?) provide it as a 
service that SA would talk to. SA would send the tokens to the service 
and the service would return a score.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Another way to use my filter in SA

2016-01-20 Thread Marc Perkel

Here's another way to use my evolution filtering idea with SA.

Get rid of all the rule scores and just make a list of the rule names. 
From the rule names generate all combinations of those rule names up to 
4 rule names in a fingerprint and learn those fingerprints as either ham 
or spam. sort of like this:


“A” “AB” “B” “C” “AC” “ABC” “BC” “D” “AD” “ABD” “BD” “CD” “ACD” “ABCD” 
“BCD” “E” “AE” “BE” “CE” “ACE” “BCE” “DE” “ADE” “ABDE” “BDE” “CDE” 
“ACDE” “ABCDE” “BCDE”


Then - when a new message comes in you make the same combo of 
fingerprints from the rule names and then use my formula.


card(Test intersect Spam diff Ham) - card(Test Intersect Ham diff Spam)

Positive result = spam
Negative result = ham



Re: I have developed a new method of blocking spam that's a game changer

2016-01-14 Thread Marc Perkel
Actually the reason I filed the provisional patent is to start talking 
about it. As long as I file for a real patent in the next year I'm good.


On 01/14/16 03:49, Dianne Skoll wrote:

On Wed, 13 Jan 2016 18:01:09 -0800
Marc Perkel <supp...@junkemailfilter.com> wrote:


When I reveal it I can explain the basic concept in about 2
paragraphs. The core idea is amazingly simple.

OK.  What you need to do next is stop talking about it. :) If you
disclose details, you risk ineligibility for patent protection,
depending on how much time elapses between disclosure and the patent
application.

Next up, you need to get a patent lawyer.  And plan on spending about
$20,000 to do everything you need to get a patent.  You need to have
someone do a patent search and offer an opinion as to patentability.
Then you need to pay the attorney fees, filing fees, etc.

Good luck!

Regards,

Dianne.





--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have developed a new method of blocking spam that's a game changer

2016-01-13 Thread Marc Perkel

Nope - that's not it.

When I reveal it I can explain the basic concept in about 2 paragraphs. 
The core idea is amazingly simple.


On 01/13/16 17:52, Dianne Skoll wrote:

Well...

You're light on details, but from the few clues you've given, is it
possible you've (re-)invented a genetic algorithm for spam classification?

http://ieeexplore.ieee.org/xpl/login.jsp?tp==5982390=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5982390
http://teaching.csse.uwa.edu.au/year4/Current/Students/Files/2007/JamesDudley/CorrectedDissertation.pdf

There are quite a few research papers on this.

Anyway, I can't test out your method, but I'll sure keep an eye on
how things evolve. :)

Regards,

Dianne.





--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



I have developed a new method of blocking spam that's a game changer

2016-01-13 Thread Marc Perkel
OK - this might sound a little unbelievable but I'm not making this up. 
I want to introduce this because I'm hoping to release this soon and I 
want to create some buzz and anticipation. I'm not going to talk about 
the details yet but I hope to soon.


I just filed a provisional method patent on the method and tomorrow I'm 
going to be talking to some investor types about it. I'm also working on 
improving the methods I'm using, but this new trick is so accurate that 
1 month ago if someone asked me if this level of accuracy was possible, 
I would have said - no way!


I'm calling it the Evolution Filter. The name is somewhat of a clue to 
how it works.


I'm seeing levels of accuracy getting really close to 100%. And it's 
especially good at actively detecting good email so false positives are 
almost not existent.


I've been filtering spam now for 15 years and been on this list for 
about that long and I'm not the kind of guy to just make this stuff up.


My intent right now is to just get enough IP protection so I can get a 
license fee from the big corps. I plane on giving it away free to the 
little guys. So that if you have less that 10,000 email accounts it's 
free. Hoping to get like 1 cent per email account per year from the big 
guys.


Although this idea is very unique, it's actually rather simple to 
implement. I'm using Redis and since SA is also using redis it should be 
trivial to add it to SA. My programming skills are good but not great. 
So the developers here should be able to do a significantly better job 
than me. It only took me an afternoon to implement the concept and it 
was already impressive with just 3 hours of learning.


This is not Bayesian or remotely similar to Bayesian. It does use a DB 
like Bayesian does and there is learning involved. But it's probably 
100x better at detecting spam and 1000x better at detecting good email.


My plan is that this technique is going to be so good that everyone is 
going to immediately implement it. And because of that the big boys will 
license it from me.


The accuracy is so good that it could put many spammers out of business. 
It can recognize spam more accurately that I can by hand looking at 
someone elses email.


If someone on this list wants to verify that I'm not just smoking the 
wrong kind of cigarettes I'm willing to let people test it on the 
condition that you report back here and tell everyone what your 
experience is.


If anyone has some feedback about how I can make this available to 
everyone and make a little something in licensing fees I'm definitely 
listening. I do want to release this to you all soon because you'll 
probably make it better than I have.


I have a little more info on Dvorak's blog.

http://www.dvorak.org/blog/2016/01/12/i-invented-a-new-way-to-filter-spam-thinking-about-a-patent/

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Looking for a script to extract readable text from emails

2015-12-28 Thread Marc Perkel
I'm looking for a script to extract readable text from emails. I want it 
demimed, ignore html, images, etc. What I'm looking for is just the 
readable text (real words). Mostly just need to extract about the first 
200 characters of real text.


Can someone point me in the right direction?

Thanks in advance.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-15 Thread Marc Perkel
This Bayes Redis works GREAT. For years I've been trying to get bayes to 
work and now finally IT WORKS



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: A Plan to Stop Violence on Social Media

2015-12-15 Thread Marc Perkel
Probably yes. But talk about opening a can of worms. If you can detect 
ISIS you can detect anything.


On 12/15/15 20:19, Wrolf wrote:

Stop me if you've heard this one.

Would it be practical to use the Spamassassin techniques of Bayesian 
filtering and RBL lists to block ISIS on social media?


Wrolf
wr...@wrolf.net <mailto:wr...@wrolf.net>


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Ignoring internal networks

2015-12-14 Thread Marc Perkel
I have situations where I need to run SA on a message that comes from 
another server. But the server it's coming from is forwarding the 
message and I want that server to be ignored (not scored white) so that 
SA can see the message as if it came to me directly.


I bet there's a command to do that. I just can't find it.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Ignoring internal networks

2015-12-14 Thread Marc Perkel


On 12/14/15 22:49, Benny Pedersen wrote:
On December 15, 2015 6:18:06 AM Marc Perkel 
<supp...@junkemailfilter.com> wrote:



I bet there's a command to do that. I just can't find it.


perldoc Mail::SpamAssassin:::Config

see *networks





I figured it out. Thanks.

internal_networks

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



OT - Has anyone tried RSPAMD?

2015-12-13 Thread Marc Perkel

And if you have - is it any good? Or am I wasting my time with it?

Thanks in advance. I know it's off topic.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel


On 12/12/15 08:47, Reindl Harald wrote:



Am 12.12.2015 um 17:44 schrieb Marc Perkel:

On 12/12/15 02:38, Axb wrote:

On 12/12/2015 12:28 AM, Marc Perkel wrote:


redis_version:2.4.10



You're using an ancient Redis version...
SA makes use of LUA support which was added in 2.6.0

You definitely need to upgrde to 3.x and you'll probably need to nuke
your DB dump  before the upgrade...



I'm running Centos 6. I just downloaded and compiled the latest. Can  I
just copy over the executables and keep my data or do I need to start 
over?


if it works with the old data you can keep it, otherwise as said 
"probably need to nuke  your DB dump"


but don't do make && make install on producton machines

sooner or later your build will get overwritten by a yum-update with 
the old version + patches because rpm don't know anything about your 
override


https://wiki.centos.org/HowTos/SetupRpmBuildEnvironment




Worked with the old data.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel


On 12/12/15 02:38, Axb wrote:

On 12/12/2015 12:28 AM, Marc Perkel wrote:


redis_version:2.4.10



You're using an ancient Redis version...
SA makes use of LUA support which was added in 2.6.0

You definitely need to upgrde to 3.x and you'll probably need to nuke 
your DB dump  before the upgrade...






What does LUA mean?

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel


On 12/12/15 02:38, Axb wrote:

On 12/12/2015 12:28 AM, Marc Perkel wrote:


redis_version:2.4.10



You're using an ancient Redis version...
SA makes use of LUA support which was added in 2.6.0

You definitely need to upgrde to 3.x and you'll probably need to nuke 
your DB dump  before the upgrade...





I'm running Centos 6. I just downloaded and compiled the latest. Can  I 
just copy over the executables and keep my data or do I need to start over?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel


On 12/12/15 09:36, Martin Gregorie wrote:

On Sat, 2015-12-12 at 08:49 -0800, Marc Perkel wrote:


I can put redis in thew yum exception list


Or, you can install your self-built version in /usr/local/bin and
adjust $PATH so it preceeds /usr and /usr/bin. This will protect your
version from yum or dnf updates.

Then keep an eye on subsequent updates and remove your version if/when
yum or dnf delivers a more recent version.


Martin





Actually I did rpm -e --justdb to take it out so it doesn't get updated.

BTW - this is working GREAT. I'm thinking about learning redis and doing 
interesting things with it.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel


On 12/12/15 08:47, Reindl Harald wrote:



Am 12.12.2015 um 17:44 schrieb Marc Perkel:

On 12/12/15 02:38, Axb wrote:

On 12/12/2015 12:28 AM, Marc Perkel wrote:


redis_version:2.4.10



You're using an ancient Redis version...
SA makes use of LUA support which was added in 2.6.0

You definitely need to upgrde to 3.x and you'll probably need to nuke
your DB dump  before the upgrade...



I'm running Centos 6. I just downloaded and compiled the latest. Can  I
just copy over the executables and keep my data or do I need to start 
over?


if it works with the old data you can keep it, otherwise as said 
"probably need to nuke  your DB dump"


but don't do make && make install on producton machines

sooner or later your build will get overwritten by a yum-update with 
the old version + patches because rpm don't know anything about your 
override


https://wiki.centos.org/HowTos/SetupRpmBuildEnvironment




I can put redis in thew yum exception list

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel

sa-learn --dump magic

0.000  0  3  0  non-token data: bayes db version
0.000  0  69569  0  non-token data: nspam
0.000  0  88747  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0  0  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal 
sync atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count



# Server
redis_version:3.0.5
redis_git_sha1:
redis_git_dirty:0
redis_build_id:a0e516305b2572d8
redis_mode:standalone
os:Linux 2.6.32-042stab112.15 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.7
process_id:2085
run_id:f2436709b788f7a9b6a043f6ed90a49512c049d2
tcp_port:6379
uptime_in_seconds:685
uptime_in_days:0
hz:10
lru_clock:7100234
config_file:/etc/redis.conf

# Clients
connected_clients:550
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:905452784
used_memory_human:863.51M
used_memory_rss:924852224
used_memory_peak:964409608
used_memory_peak_human:919.73M
used_memory_lua:177152
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.6.0

# Persistence
loading:0
rdb_changes_since_last_save:98170
rdb_bgsave_in_progress:1
rdb_last_save_time:1449940748
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:7
rdb_current_bgsave_time_sec:1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:3806
total_commands_processed:1302390
instantaneous_ops_per_sec:2133
total_net_input_bytes:52637812
total_net_output_bytes:15806201
instantaneous_input_kbps:80.48
instantaneous_output_kbps:23.47
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:194094
keyspace_misses:65528
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:44672
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:5.97
used_cpu_user:34.09
used_cpu_sys_children:4.85
used_cpu_user_children:57.29

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=3,expires=0,avg_ttl=0
db2:keys=9277218,expires=4,avg_ttl=2546306553



Re: Trying Bayes / Redis

2015-12-12 Thread Marc Perkel


On 12/12/15 15:28, Benny Pedersen wrote:
On December 12, 2015 5:49:27 PM Marc Perkel 
<supp...@junkemailfilter.com> wrote:



I can put redis in thew yum exception list


so much precompiled problems, why not upgrade to centos 7 ?




Because I'd have to upgrade 50 servers for consistency and if I do that 
I'll probably try something other than centos.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying to understand how bayes works.

2015-12-11 Thread Marc Perkel


On 12/11/15 06:58, RW wrote:

On Thu, 10 Dec 2015 13:54:05 -0800
Marc Perkel wrote:


Bayes breaks the message down into some sort of tokens and then does
statistics on those tokens as to tokens found in spam vs. tokens
found in ham.

But what about combinations of tokens? I'm thinking that I'd like to
have something that says when it sees tokens X and Y and Z then
that's spam even though X,Y,Z might be in ham when not combined.

Does bayes do that or is there anything that does?

In general making arbitrary combinations is not practical. What some
filters do is make tokens out of word combinations in a sliding window.
This can be very useful in catching difficult spams that are composed
of common neutral words, although in my experience it's a little more
prone to FPs than Bayes.

I use Bogofilter and DSPAM.

On Thu, 10 Dec 2015 21:28:44 -0800
Marc Perkel wrote:


I'm thinking about incorporating Bogofilter but instead of feeding it
messages I'm thinking about feeding it the SpamAssassin results - the
rule names it hit + other data about the message and then let it
score the rules. That's what I want to experiment with.

I thought of trying something like that myself, but my filtering became
practically perfect before I got around to it, so I never bothered. And
I think there are some problems with it.

The first is that FNs in SpamAssassin tend to come from a lack of
useful information rather than the scoring system failing to combine it
well.

The second is that most rules are either fairly neutral or strongly
spammy. There are few strong ham indicators to balance the rest. You
might be able to balance it with metadata, and reputation information,
but the trick is to do it without getting a high FP rate on new senders.

If you did wish to take account of rule combinations, you'd really have
to do it yourself because sliding-window tokenization wouldn't do it
well.




What I was thinking about doing was creating a string of tokens that 
represented key features of the message. Then run that through a program 
that created new tokens out of every possible combination of 2 tokens 
and adding that to the string. Then running bayes on that. My tokens 
will not be the text of the message but rules hit including a lot of 
rules I create not for points but just for tokens.


For example. I create rules that look for many phrases about a subject 
and the subject becomes a token. For examples:


JESUS
ROYALTY
MONEY

But themselves not an indicator of spam. But if you have all 3 then it's 
definitely spam. The idea is to not look at words but look at the 
meaning of phrases. For instance, introductions:


Dear (friend)
I am (someone)
I am contacting you because (some reason)

This says - I don't know you.

I am a member of the (Nigerian royal family|Armed forces in Iraq) etc.

These can all be reduced to tokens and then you just look for 
combination of tokens.




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayes / Redis

2015-12-11 Thread Marc Perkel
So far so good. Thanks for your help. It looks like it's actually 
working. Have about 10k spam/ham tokens so far.




Re: Trying Bayed / Redis

2015-12-11 Thread Marc Perkel


On 12/11/15 13:07, Axb wrote:

On 12/11/2015 09:57 PM, Marc Perkel wrote:

Trying to set up Bayes using redis.

I created a server for redis and have it running.

Can't finds the docs on how to create the databases initially and such.
And any tips appreciated.

Thanks in advance.



Did you look into the docs I put in

https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/ 
 ?


You don't have to "create" the DB - just look at the redis.conf sample...

The moment the redis server gets data, it will "autocreate" the first 
database, which is the one you'll be using for Bayes.


Those little bits should get you going.

Definitely keep an eye on memory usage * 2

by tuning
bayes_token_ttl 30d
bayes_seen_ttl  14d
you can "tune usage"

IMPORTANT: If you want to do backup dumps of the DB you'll need twice 
the amount of memory.
Redis starts a second server instance to do the dump which uses +- the 
same amount of memory. So if you keep 4GB of Bayes in memory, you 
should keep a bit more that 4GB free for the dump. IF the server 
starts swapping during the dump you're shot.


You'll definitely want to do the dumps so BAyes survives reboots after 
a a kernel update, for example.
During the reboot, the BAyes plugin will just timeout and log that it 
can't connect to the servers. Looks worse thanit really is. When teh 
server is available it reconnects automatically - no need for a spamd 
reload/restart.


Enjoy



OK - I didn't think it was going to be that easy.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying Bayed / Redis

2015-12-11 Thread Marc Perkel
Anyone using this rule timing plugin? Having trouble getting it to work. 
Just wondering if it's worth it?


Mail::SpamAssassin::Plugin::RuleTimingRedis


Re: Trying Bayes / Redis

2015-12-11 Thread Marc Perkel


On 12/11/15 14:59, Axb wrote:

On 12/11/2015 11:55 PM, Marc Perkel wrote:

So far so good. Thanks for your help. It looks like it's actually
working. Have about 10k spam/ham tokens so far.



sa-learn --dump magic should show if it's "alive"

For safety, don't take your eyes off memory usage and play with:

bayes_token_ttl
bayes_seen_ttl

I've set
bayes_seen_ttl 1


Curious what your "redis-cli info" looks like after a while of 
running... adn if you're happy with speed and results...


keep us posted...

Axb




0.000  0  3  0  non-token data: bayes db version
0.000  0  14814  0  non-token data: nspam
0.000  0  11933  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0  0  0  non-token data: oldest atime
0.000  0  0  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal 
sync atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


redis_version:2.4.10
redis_git_sha1:
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.6
process_id:472
uptime_in_seconds:4846
uptime_in_days:0
lru_clock:284149
used_cpu_sys:29.75
used_cpu_user:91.39
used_cpu_sys_children:4.31
used_cpu_user_children:49.34
connected_clients:671
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:185037872
used_memory_human:176.47M
used_memory_rss:182960128
used_memory_peak:193339376
used_memory_peak_human:184.38M
mem_fragmentation_ratio:0.99
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:42332
bgsave_in_progress:0
last_save_time:1449876339
bgrewriteaof_in_progress:0
total_connections_received:110460
total_commands_processed:9275568
expired_keys:0
evicted_keys:0
keyspace_hits:1821764
keyspace_misses:793471
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:16204
vm_enabled:0
role:master
db2:keys=2012232,expires=4


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Trying Bayed / Redis

2015-12-11 Thread Marc Perkel

Trying to set up Bayes using redis.

I created a server for redis and have it running.

Can't finds the docs on how to create the databases initially and such. 
And any tips appreciated.


Thanks in advance.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Try my IXHASH

2015-12-10 Thread Marc Perkel


On 12/10/15 10:58, Bill Cole wrote:

On 10 Dec 2015, at 13:25, Paul Stead wrote:


On 10/12/15 18:23, Paul Stead wrote:

On 10/12/15 17:24, Bill Cole wrote:

On 10 Dec 2015, at 10:48, Paul Stead wrote:


0.004% hit rate on ham


Clarify this please: 4 out of 100k hits are ham (not so bad) OR 4 out
of 100k hams get hit (OUCH)


The former, 4 out of 100k hit are ham emails

Re-clarifying - out of 100k ham emails, 4 of these hit on this iXhash


So: unfit for a high score (e.g. the suggested 5) on a system 
receiving a lot of ham. Good to know.






I think 4 out of 100,000 FP is really good. 58% overlap is more 
confirmation that a spam is a spam. But that means 42% is new spam not 
caught by other iXhash. So - not phenomenal - but not bad.


Thanks for the feedback.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Trying to understand how bayes works.

2015-12-10 Thread Marc Perkel
I've had bayes disabled in SA because it seems to not be able to stay 
working in a high volume situation. The MySQL server can't seem to keep 
up with it even on very fast computers.


But - thinking about trying something interesting - doing my own bayes 
in a different way.


Here's my question.

Bayes breaks the message down into some sort of tokens and then does 
statistics on those tokens as to tokens found in spam vs. tokens found 
in ham.


But what about combinations of tokens? I'm thinking that I'd like to 
have something that says when it sees tokens X and Y and Z then that's 
spam even though X,Y,Z might be in ham when not combined.


Does bayes do that or is there anything that does?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Trying to understand how bayes works.

2015-12-10 Thread Marc Perkel


On 12/10/15 18:31, Benny Pedersen wrote:

Marc Perkel skrev den 2015-12-10 22:54:

I've had bayes disabled in SA because it seems to not be able to stay
working in a high volume situation. The MySQL server can't seem to
keep up with it even on very fast computers.


i got a palm Zire that can do ocr on handwrited text :=)

pretty good for the kind of cpu it have


But - thinking about trying something interesting - doing my own bayes
in a different way.


i have tryed bogofilter with very good succes, and i see problems with 
bayes here aswell, i remember you changed to mariadb ?`


at that time you sayed it worked better then mysql ?

did it fail again ?


Here's my question.

Bayes breaks the message down into some sort of tokens and then does
statistics on those tokens as to tokens found in spam vs. tokens found
in ham.

But what about combinations of tokens? I'm thinking that I'd like to
have something that says when it sees tokens X and Y and Z then that's
spam even though X,Y,Z might be in ham when not combined.

Does bayes do that or is there anything that does?


if z is scored as spam, and x and y is ham, then its ham basicly that 
how bayes works, but a single mail might be lots of digest to compare 
for this to say spam or not


test bogofilter

put 100 spam mails in a spam folder
put 100 non spam mails in a ham folder

train bogofilter with this 2 folders in one go, not first ham and then 
spam, it must be done in one bogofilter call train, configure 
bogofilter.cf plugin for spamassassin, test it :=)


YMMV




Yes MariaDB was better than MySQL but not good enough to keep up. I even 
tried putting the database on ram disk and still didn't work.


I'm thinking about incorporating Bogofilter but instead of feeding it 
messages I'm thinking about feeding it the spamassassin results - the 
rule names it hit + other data about the message and then let it score 
the rules. That's what I want to experiment with.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Try my IXHASH

2015-12-09 Thread Marc Perkel


On 12/09/15 05:50, Rick Macdougall wrote:


Hi,

The messages it flags are messages that would have been caught without 
it. About 2% of messages it flags are not seen by any other markers.


Regards,

Rick




Any false positives?

I suppose catching the same messages again just creates more confidence 
in the spam flagging.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



  1   2   3   4   5   6   7   8   9   10   >