OT - 18 months and I'm still alive
hello everyone, I haven't been posting much here lately but I want to let everyone know that I'm still alive. I sent this out to my list of friends. Feel free to share it. And I'm still filtering spam. Hello friends and family, As of today I am 18 months in since I was diagnosed with stage 4 lung cancer on Aug 1 2016. And I'm still alive and feel mostly normal. Back when I was diagnosed I didn't expect to still be around this long especially considering that I have refused all conventional treatment. In fact, it's more likely that chemo would have had no effect against my cancer and would likely have killed me. I have just completed a second round of radiation and immunotherapy drugs and this time there was no effect either positive or negative. The idea was that if there is an immune response in progress that this will boost it. But at this point I still have no idea why I'm still alive because the scans remain confusing. The last scan in early December showed everything the same size, nothing changed; nothing bigger, nothing smaller, no new mets. Most people would be thrilled but I'm confused. The immune system trick I'm doing has a more binary outcome where it either totally wipes out the cancer or it does nothing. So I expected it to either be a lot bigger or a lot smaller. There are several possible explanations. 1. White blood cells also light up on a PET scan and the cancer might already be totally dead and is very slowly being eaten by the immune system. And that would be great. 2. I have been taking an anti-cancer cocktail of my own design that attacks cancer through multiple pathways and this combination has kept the cancer at a stalemate, neither growing or shrinking. But it would be odd that it didn't grow or shrink on any tumor. But apparently this cancer is hard to image and results are not reliable. In the past different doctors see different things. 3. I have a very slow growing cancer, and I'm just lucky, and nothing I did had any significant effect. 4. The universe really can't get along without me in it and it changed its mind about booting me out. At this point I actually feel like I know less than I used to but at the moment I'm working on 2 different strategies at the same time. While I am hopeful the immunotherapy trick worked I have added more anti-cancer supplements as well as starting on the Ketogenic diet. If option 2 is correct my new plan should set the cancer back quite a bit. I recently found out that normal cells can run on either sugar or fat, but cancer runs on only sugar. That immediately lead to the ketogenic diet where I eat no carbs and make up for the calories in eating fat. And as a side effect - I'm losing weight. There also is evidence that once I get to a very sugar starved state that hyperbaric oxygen treatments might really fry cancer. But I'm still designing a cocktail that should really kick butt in a way that has beneficial side effects. The keto diet is very counterintuitive. Bacon good, fruit bad. You have to eat fat to lose fat. But I'm losing weight and women should find me even more irresistible. It's a diet that seems to work well for anyone contemplating dieting. I'm also documenting everything so if you know people who have cancer - this might be useful. And feel free to pass this email to anyone interested. It might turn out that I'm one of the most advanced minds on the planet for fighting cancer and if that's true - isn't that just a little bit sad. Here's the link: http://wiki.junkemailfilter.com/index.php/Cancer I'm also discovering the very very premise that the oncology world is based on - is wrong. Most people believe cancer starts with a genetic mutation that leads to cancer. But it turns out that cancer might start as a metabolic disorder that leads to genetic damage. If this is true it might be easy to create a general cure for all cancers that targets the unique metabolism of the cancer cells. Something like creating a toxin that activates in the presents of fermentation byproducts and attached to sugar as a delivery mechanism. You get a shot and the next day all your cancer is dead. More on that in the future. So - bottom line is - more likely than not that I'm still here next year. But I can still drop dead unexpectedly at any moment. If that happens either the universe will end or someone will start a cult in my name based on the worst aspects of my personality. (I'm glad I won't be around to see that.) But if I'm dying I'm doing it way to slow to be interesting. Thanks for reading my detailed explanation of no news really. Marc Perkel m...@perkel.com Twitter: mperkel -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Cell phone networks list?
Does anyone have a cell phone network list of host names where email from cell phones might be coming from? So far I have: mycingular.net myvzw.com Can you add to this list? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Would anyone be interested in a SA enhancing service?
I think people are misunderstanding. It's not a spamd service. It's basically another rule you would add to your config. I think I need to do it first and then talk about it. On 09/22/17 09:33, Kevin A. McGrail wrote: It's very feasible but it's a blurry off topic issue to even discuss here for a commercial service. At worst you just make yourself a standard mx of record filter system provider. If you want a "plugin" you just offer spamd service restricted by ip address with ssl. Perhaps you are over thinking things? Just offer a complete spamd replacement instead of an extra test. My $0.02. Regards, KAM On September 22, 2017 12:17:04 PM EDT, Marc Perkel <supp...@junkemailfilter.com> wrote: Probably both. Not sure. Just trying to see if it's feasible. On 09/22/17 09:12, Kevin A. McGrail wrote: Are you discussing a free or a commercial service? Regards, KAM On September 22, 2017 11:40:50 AM EDT, Marc Perkel <supp...@junkemailfilter.com> wrote: This is something I'm thinking about doing - providing a service that integrates into SA as a plug in and communicates with my servers to return a useful score enhancer. If there is interest my initial demo test will be just stuffing the subject line into a IP/port and returning a number where positive is spam and negative is ham. This would just be a proof of concept. The next level would be sending the message headers and eventually - the full message. Would need someone to write a simple plugin - not a perl guy - but how hard can that be? Would eventually need to be encrypted though. Starting with just the subject won't return a result all the time. Many request will return a 0 if it can't figure it out. If it does return a result that is significantly away from 0 it's probably right. And it is more likely to return a result from ham than spam confirming good email as good. Obviously - using the header and then the whole message will be more accurate. I'm using new techniques no one else is using. So - any interest? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400 -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Increasing spam level for MX backup server?
Yes - that's a favorite trick of spammers to hit the backup server. If you want you can add a third (fake) backup server: tarbaby.junkemailfilter.com It returns 451 on everything and gets rid of some of that spam (spammers don't retry) and I get some training data for my black lists. On 09/22/17 09:19, Davide Marchi wrote: Hi friends, On Debian Jessie, Postfix 2.11.3 and Spamassassin 3.4.0-6, I've just setup an MX email backup server and now I realize that new spam come from the MX backup server.. Is there any way to tell to reject any mail coming to the MX backup server, if the primary server is up? And again, many spam email came from a mine fake and nonexistent "alias", for example: on my server I guest i...@foo.org, and its alias: ali...@foo.org and ali...@foo.org, and stop. The spam come from ali...@foo.org, that doesn't exist, how I could reject and prevent to delivery from these address, without compromise the backup server? Many many thanks! Davide Italy -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Would anyone be interested in a SA enhancing service?
Probably both. Not sure. Just trying to see if it's feasible. On 09/22/17 09:12, Kevin A. McGrail wrote: Are you discussing a free or a commercial service? Regards, KAM On September 22, 2017 11:40:50 AM EDT, Marc Perkel <supp...@junkemailfilter.com> wrote: This is something I'm thinking about doing - providing a service that integrates into SA as a plug in and communicates with my servers to return a useful score enhancer. If there is interest my initial demo test will be just stuffing the subject line into a IP/port and returning a number where positive is spam and negative is ham. This would just be a proof of concept. The next level would be sending the message headers and eventually - the full message. Would need someone to write a simple plugin - not a perl guy - but how hard can that be? Would eventually need to be encrypted though. Starting with just the subject won't return a result all the time. Many request will return a 0 if it can't figure it out. If it does return a result that is significantly away from 0 it's probably right. And it is more likely to return a result from ham than spam confirming good email as good. Obviously - using the header and then the whole message will be more accurate. I'm using new techniques no one else is using. So - any interest? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Would anyone be interested in a SA enhancing service?
This is something I'm thinking about doing - providing a service that integrates into SA as a plug in and communicates with my servers to return a useful score enhancer. If there is interest my initial demo test will be just stuffing the subject line into a IP/port and returning a number where positive is spam and negative is ham. This would just be a proof of concept. The next level would be sending the message headers and eventually - the full message. Would need someone to write a simple plugin - not a perl guy - but how hard can that be? Would eventually need to be encrypted though. Starting with just the subject won't return a result all the time. Many request will return a 0 if it can't figure it out. If it does return a result that is significantly away from 0 it's probably right. And it is more likely to return a result from ham than spam confirming good email as good. Obviously - using the header and then the whole message will be more accurate. I'm using new techniques no one else is using. So - any interest? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: OT - Possibly some good news
Hi Ted, You know what's interesting is that the adaptive immune system seems to work a lot like a spam filter or an antivirus program. Technically what I did was a database update to my immune system to reclassify my cancer as enemy. And the code to kill the cancer is in the cancer. On 07/05/17 09:54, Ted Mittelstaedt wrote: Hi Marc, There are drugs that will stimulate the production of white blood cells at a tremendous rate, I was on one of these back in '94 when I was dealing with my cancer. It's well known in oncology that everyone always has a few cancer cells bouncing around in their bodies all the time, but that the immune system takes care of them. It's just that cancers grow so fast once they get established that they overwhelm the immune system. That's why once you get diagnosed with cancer your body needs help. My personal opinion is that there's another factor you aren't mentioning here and that's whether or not you're a "fighter personality" AKA jackass. In short, do you like to fight? (you do, I can tell that just by reading your post) I've met a number of people with cancer, serious cancers, since my own. Some have later died. But, all of the people who have survived their cancers that I've known - they have been fighters. I've never known a cancer survivor with a passive, resigned to their fate attitude who has survived a serious cancer. I don't think oncologists like to talk about this much because it seems kind of unfair to say that if your a nice person (you don't have a fighter personality) your definitely gonna die, and if you are an asshole (you do), your probably gonna live (no guarantees, though). Fighters have different ways they fight, also, and medical science really doesn't like arbitrary cures, you know. They want something that works all the time for everyone, the same way. That's why they love the drugs so much and really dislike the holistic crap. But I'm pretty sure most of the good medical researchers, if you nailed them to a wall, they would admit this kind of thing exists, and I daresay that there's a hell of a lot of drug researchers out there trying to figure out what drug they can create that will "switch on" the fighter personality... Ted On 7/4/2017 8:45 AM, Marc Perkel wrote: I know this is off topic, but it is looking like I might not die from cancer after all. At some point I'll write something up about how the immune system is like a spam filter. But today - I think I might have cured my incurable cancer. As you all know from my previous announcements that I have been working on designing a custom immunotherapy treatment that has never been tried before, and the hard part as usual, getting the doctors to do it. Well - I finally got the treatment and - it appears as if it worked. And I stress the word "appears" because it's looks like it's going to take about 2 months before imaging is going to show what's happening. But my cancer symptoms are gone. I am in a state of stunned disbelief. Too early to believe it - too late not to believe it. On Monday June 19 I got an infusion of ipilimumab which is an immunotherapy drug. 2 days later I got a series of 3 radiation treatments (21st thru 23rd). Dosage, 3 fractions of 9gy xrays from Varian Trilogy set at 9MV. These treatment we unusual in that instead of irradiating the whole tumor, I asked that they just burn a disk in the center of the main tumor leaving the rest of the tumor undamaged. This request was very counter intuitive in radiology because they are trained to kill every cancer cell they can possibly hit and it took a lot of work to get them to deliberately leave tumor undamaged. But that was important because I was turning the tumor into a school, not a battlefield, where I was teaching my immune system what the cancer looked like (antigens) and classify it as an enemy. By using partial radiation I created an environment where white blood cells in my immune system could interact with dead cancer and learn it. 4 days after treatment I started getting a reaction. I was queasy, low energy, aches and pains, chills. Wgen I got home I had a fever of 101, and it occurred to me, is this the fever I was hoping for? Fever indicates that I'm having an immune response. My immune system is fighting something. Was it attacking the cancer? So I took a hot bath and used a heating pad to increase the fever and got it up to 103. I wanted to create heat shock proteins and signal the battle was on. Wednesday still had fever and was rather out of it. Thursday morning fever broke and all my cancer symptoms were gone. I have aednocarsonoma and the aedno part of the name means "mucus secreting". On Thursday the mucus went to almost none. I had been coughing up a lot even before I was diagnosed last August. I went out and sawed limbs off a tree, hard work, and didn't cough up anything. At night when I lay down and in the morning when I get up, almos
OT - Possibly some good news
elf is victory. I will write more when I find out more. Marc Perkel Random Genius -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Outgoing email without DMARC
On 05/02/17 07:14, Rob McEwen wrote: On 5/1/2017 10:30 PM, Marc Perkel wrote: Might be slightly off topic but I've been running into more delivery problems with outgoing email because I don't use DMARC. I don't know a lot about it but is there some simple way I can get around this? Kind of a pain in the rear. Marc, This probably has more to do with DKIM than DMARC? Either way... you're not willing to jump (or haven't yet jumped) though the hoops that the largest ISPs/hosters want us all to jump through... meanwhile... so many of them (and for many many years) have sent such high volumes AND high percentages of outbound spam to all of our SMTPs - to such an extent that you and I would be out of business if our SMTP outbound traffic did that for just one week. I sort of wish they (or many of them... "if the shoe fits...") would clean up their own act FIRST - get the basics done FIRST - before imposing new standards on the rest of us. I'm in the same boat - I'm now having to set aside dozens of hours to get all various domains updated to DKIM so that they'll have more success sending to a certain large/famous hoster - who has sent my server a shitload of spam over the past several years (not just volume-wise - but percentage-wise... I'd be run out of town if I did that) Yeah - I know what you mean. Many of these ISPs would be blacklisted if they weren't so big. I get an amazing amount of spam from the big guys. I was just wondering what I could do to get by. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Outgoing email without DMARC
On 05/02/17 03:54, RW wrote: On Mon, 1 May 2017 19:30:01 -0700 Marc Perkel wrote: Might be slightly off topic but I've been running into more delivery problems with outgoing email because I don't use DMARC. How do you know it's because you don't use DMARC. The rejection message specified dmarc as the reason. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Outgoing email without DMARC
Might be slightly off topic but I've been running into more delivery problems with outgoing email because I don't use DMARC. I don't know a lot about it but is there some simple way I can get around this? Kind of a pain in the rear. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: New whitelisting trick using from and spf
On 03/06/17 15:22, David Jones wrote: From: Marc Perkel <supp...@junkemailfilter.com> Sent: Monday, March 6, 2017 11:05 AM To: users@spamassassin.apache.org Subject: Re: New whitelisting trick using from and spf do you mean the header From: address? because anyone doing SPF does spf checks does what you describe on the envelope from: addres. Yes - I'm using the headers From: address. Not good. SPF should be checked against the envelope-from address which is more trustworthy. The From: header can be spoofed trivially with no validation/authentication if DMARC is not enabled. Most email is not enabled for actual DMARC checking. Most have SPF enabled. Some have DKIM enabled. But DMARC can go one step further to check the From: header and most don't do it unless they are a major target of spoofing like Paypal, eBay, etc. Dave Yes - I'm doing something different - and possibly more effective. And it's working really well. Those who spoof would fail the test and not get while listed. The fact that From is easier to spoof makes it more effective - not less. So if the from is @paypal.com and the sending host is not SPF compatible then it doesn't get white listed. Seems to be working very well. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
List of legit mass mailers
Just wondering if anyone has - or in interested in - a list of legit mass mailing sources? There are many domains that remail/deliver for other domains that are 95%+ good email. And they are not perfect and sometimes they get scammed but are mostly good. Just wondering if anyone has a list - or is interested in me producing such a list? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: New whitelisting trick using from and spf
On 03/06/17 04:19, Matus UHLAR - fantomas wrote: On 05.03.17 10:38, Marc Perkel wrote: Well, new to me. Maybe others have thought of this. Many domains send nothing but good email and if you whitelist them based on FCRDNS all is good. Been doing that. But ... Many domains send nothing but good email and they send through reputable email sender services which are mostly good by not perfect. So can't just whitelist that. What I'm doing now is whitelisting the domains that are good, but doing SPF checks on the from address. do you mean the header From: address? because anyone doing SPF does spf checks does what you describe on the envelope from: addres. Yes - I'm using the headers From: address. If the from address is whitelisted AND the SPF of the from address is good - I pass the email. or do you do this on MTA-level (which means it's off-topic)? I do it at the MTA level - but it's not off topic because the concept can be applied to spamassassin. Also - I have almost 100,000 domains in my hostkarma.junkemailfilter.com (127.0.0.1) rbl. So I'm passing a lot of good email with this trick. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
New whitelisting trick using from and spf
Well, new to me. Maybe others have thought of this. Many domains send nothing but good email and if you whitelist them based on FCRDNS all is good. Been doing that. But ... Many domains send nothing but good email and they send through reputable email sender services which are mostly good by not perfect. So can't just whitelist that. What I'm doing now is whitelisting the domains that are good, but doing SPF checks on the from address. If the from address is whitelisted AND the SPF of the from address is good - I pass the email. I'm still experimenting with this but I think I'm onto something. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Possibly some good news - OT
. And these 2 substances are all over the place in NIH studies for new cancer treatments for a variety of age related cancers. After starting to take the Amazon drugs I did notice that I'm coughing up less stuff and when I started talking the prescribed drugs there seemed to be another reduction in mucus production. And that might be a good sign. So I am now talking all 4 drugs in combination, and if I'm right this is the most effective treatment for my specific lung cancer ever used. And it made enough sense to talk my oncologist into giving it a try. At this point I'm optimistic that I have kicked the can down the road and that it's now years and not months. If I get 2 years out of this I'm calling it a win. And - I might be dead wrong, it might not work, and unexpectedly dropping dead is a side effect of Vandetanib. But even if that happened - this is still my best choice. In my battle against cancer, round one goes to me. Ultimately in a battle which includes certain death the only variable is the quality of the battle. I'm beginning to relate to Klingons in Star Trek in that the fight is as important as the outcome. And this is the kind of fight that represents who I am and how my final adventure plays out. And in some ways defeating the hospital an insurance bureaucracy is probably tougher that figuring out how to cure cancer. So the idea of me beating this is very unlikely, but as Elon Musk would say, "Success is at least one of the possible outcomes." Now I'm off to learn more of the big medical words to prepare for round 2. How hard can it be? So - just wanted to share my possible good news and hopefully I will be around to be at 2 more Pioneer Awards in the future. That's my new working estimate. Marc Perkel /root -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: spamassassin and caching nameservers
For what it's worth I use PowerDNS for a recursive nameserver and happy with it. Very easy to set up. On 08/22/16 18:15, Alex wrote: Hi all, I've just set up spamassassin on a cable connection that appears to have sporadic DNS timeouts using bind. It shouldn't be so slow that queries timeout, but apparently they are. I'm hoping rbldnsd would provide that additional responsiveness needed. I've set up rbldnsd before, to be used as a way to query a local RBL. Has anyone configured it as a local caching nameserver, and if so, could you share your config? I'd like it to listen on localhost/53 in place of bind and I would think I would need the root zones in there somewhere, but there doesn't appear to be many examples of doing this out there to reference. Is it a full-fledged nameserver, suitable enough for MX, A, TXT, queries, etc for this purpose? Thanks, Alex -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
On 08/22/16 09:06, Dianne Skoll wrote: On Mon, 22 Aug 2016 09:03:38 -0700 Marc Perkel <supp...@junkemailfilter.com> wrote: The ones that are the same are of no interest. Only where it matches one side and not the other. But... but... that's exactly like Bayes if you throw out tokens whose observed probability is not 0 or 1. Also, in your list of tokens, they are all phrases ranging from 1 to 4 words, and that's why you get good results. Multiword Bayes is just as good, and I know that from experience. This is nothing like bayes. Bayes is creating a mental block. When I describe it to people who don't know bayes they immediately get it. If I describe it to people who know bayes - they confuse it. Bayes is a probability spectrum based on a frequency match on both sets. That's not even close to what I'm doing. Also - some of what I'm doing is all combinations, not just sequential. So it's like a system that writes and scores it's own rules. I just throw data at it and it classifies it. The real magic is the feedback learning. So as it identifies ham it learns new words and phrases that then match email from other people. So it learns how normal people speak, it learns how spammers speak, and it identifies the DIFFERENCES between the two. And it's completely automated. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
On 08/22/16 08:58, RW wrote: On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote: On 08/22/16 07:28, Dianne Skoll wrote: The other two possibilities (no tokens in either or some tokens in both) are undecidable. Exactly! In the past you've said that when there are token in both you compare the counts. I do a very little bit of that. I make additional sets I cal nearly-ham and nearly-spam where the ratio is very high, and count it as a half score. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
On 08/22/16 07:45, Dianne Skoll wrote: On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel <supp...@junkemailfilter.com> wrote: So. What percentage of emails using your algorithm are actually decidable? Almost 100% if you look at a wide variety of tokens from multiple attributes. I can't believe that, or I'm missing something. Almost every spam I see contains words that also appear in ham. Things like "this" or "invoice" or "regards" or "dear". What am I missing? Hi Dianne, what your missing are word combinations. Usually it's not a single word but a combination of words that trigger a result. Example of how NOT matching works Let’s take 2 subject lines and see how this works. “Meet hot Russian Brides Online!” “I read an article about Russian Brides in a magazine” A traditional spam filter using Bayesian or hard coded rules about “Russian Brides” might determine that only 1 out of 500 emails mentioning the phrase “Russian Brides” is a good email. Thus the second line would have points assessed against it in the classification process using these traditional methods. Using the Evolution Filter the phrase “Russian Brides” is in both sets and therefore has no influence on the results. But the first subject matches these phrases in the Spam Only set. “Meet hot” “Meet hot Russian” “Meet hot Russian Brides” “hot Russian Brides Online!” “Russian Brides Online!” “Brides Online!” “Online!” The second subject matches these phrases on the ham only set that are never used on the spam set. “I read an article” “read an article” “read an article about” “about Russian” “an article about” “in a magazine” “Brides in a” So even though the phrase “Russian Brides” has no influence each subject hits either ham or spam many times where the same phrase was never used in the subject line in the opposite set. And the number of hits is significant enough just from these subjects to cause the fingerprints to be learned, and that’s just looking at the Subject attribute. When this is combined with testing all attributes the messages usually come out strongly on one side or the other. In rule based systems one would not normally build a white list rule to to allocate points based on seeing the phrase “read an article about”. That’s where the Evolution Filter is different. It didn’t need to have that rule because since it is comparing to the infinite set of what is not matched on the other side, it dynamically create billions of rules automatically. [edit <http://wiki.junkemailfilter.com/index.php?title=The_Evolution_Spam_Filter=edit=6>] -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
On 08/22/16 07:40, Antony Stone wrote: On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote: On 08/22/16 07:28, Dianne Skoll wrote: What percentage of emails using your algorithm are actually decidable? Almost 100% if you look at a wide variety of tokens from multiple attributes. Subject, body, content flags, header structure, combinations of all domains reference, php scripts, name part of from addresses, behavior flags. I would have said that a very large number of the words used in spam mails are the same as the words used in ham mails, so I suspect I'm confused about what constitutes a "token". The ones that are the same are of no interest. Only where it matches one side and not the other. I fail to see how the "name part of from addresses" are unlikely to match ham, for example, since I see quite a lot of spam apparently from myself. Antony. Some spammers have Viagra in the name part. The name part is very spammy. I also store to and from email addresses so that relationships between people corresponding create a ham result. (I filter outbound as well for some people) -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
On 08/22/16 07:37, Antony Stone wrote: On Monday 22 August 2016 at 16:34:09, Marc Perkel wrote: OK - Trying to make the really simple. Just talking about concept now. Let's say I get an email where the subject is "I have aednocarsonoma of the lung". Right off you know it's ham because spammers never use the word "aednocarsonoma" and normal people do. Spammer also never use: "of the lung" "the lung" "aednocarsonoma of" How do you create those boundaries to define the tokens? Here's an example: "the quick brown fox jumps over the lazy dog" becomes ... "the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox" "brown fox" "quick brown fox" "the quick brown fox" "jumps" "fox jumps" "brown fox jumps" "quick brown fox jumps" "over" "jumps over" "fox jumps over" "brown fox jumps over" "the" "over the" "jumps over the" "fox jumps over the" "lazy" "the lazy" "over the lazy" "jumps over the lazy" "dog" "lazy dog" "the lazy dog" "over the lazy dog" So - tell me you follow this so far. Spammers don't spam about aednocarsonoma. In this case I'm identifying ham because in some previous email people were talking about lung cancer and those phrases were learned as ham. But what makes it really ham is not just that it matches previous ham, but it doesn't match previous spam. A word like Viagra for example would produce no score because it is in both sets. However "cheapest viagra online" would match spam and not match ham indicating it's spam. So what makes "cheapest Viagra online" a token, such that "cheapest" and "online" are not tokens? They would all be tokens. Just pointing out one that would match spam and not match ham. "cheapest" and "online" would likely be in both sets and would be ignored. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
OK - Trying to make the really simple. Just talking about concept now. Let's say I get an email where the subject is "I have aednocarsonoma of the lung". Right off you know it's ham because spammers never use the word "aednocarsonoma" and normal people do. Spammer also never use: "of the lung" "the lung" "aednocarsonoma of" So - tell me you follow this so far. Spammers don't spam about aednocarsonoma. In this case I'm identifying ham because in some previous email people were talking about lung cancer and those phrases were learned as ham. But what makes it really ham is not just that it matches previous ham, but it doesn't match previous spam. A word like Viagra for example would produce no score because it is in both sets. However "cheapest viagra online" would match spam and not match ham indicating it's spam. The magic here is that this detects both spam and ham. And it is especially good at detecting ham, which greatly reduces false positives.
Re: Matching infinite sets
On 08/22/16 07:28, Dianne Skoll wrote: On Mon, 22 Aug 2016 07:16:41 -0700 Marc Perkel <supp...@junkemailfilter.com> wrote: Anthony, Yes - I don't store Set B. I store Set A. B is defined by what's NOT in A. So I test A and if it's not matched it's set B. Set B is just a negative match on A. Let me ask you a question. As far as I understand your algorithm, if an email contains at least one token in the "ham" set and zero tokens in the "spam" set, you classify it as ham. And conversely, if it contains at least one spam token but zero ham tokens, you classify it as spam. YES! YES! YES! Although I look at some thousand "fingerprints" to get a more significant result. The other two possibilities (no tokens in either or some tokens in both) are undecidable. Exactly! So. What percentage of emails using your algorithm are actually decidable? Almost 100% if you look at a wide variety of tokens from multiple attributes. Subject, body, content flags, header structure, combinations of all domains reference, php scripts, name part of from addresses, behavior flags. Regards, Dianne. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
On 08/22/16 06:55, Antony Stone wrote: On Monday 22 August 2016 at 15:46:41, Dianne Skoll wrote: On Mon, 22 Aug 2016 06:04:49 -0700 Marc Perkel <supp...@junkemailfilter.com> wrote: Set A - a finite set - has some members, Set B - an infinite set - is everything that is NOT in Set A Set B is a very special case of an infinite set. We're talking about infinite sets in general. Also, you have to realize that although set B is in principle infinite, in practice it is not. Computers have finite memory, and although the number of email tokens representable in the memory of a computer is very, very, very large, it's not infinite. I do not think that Marc is proposing to actually store set B in a computer (or anywhere else). Set B is simply a theoretical construct, defined as the inverse of Set A, and to discover whether something is a member of it, you do not search through the infinite set B for a match, you instead check all members of finite set A for a non-match. If nothing in Set A matches X, then X is a member of Set B. Antony. Anthony, Yes - I don't store Set B. I store Set A. B is defined by what's NOT in A. So I test A and if it's not matched it's set B. Set B is just a negative match on A. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Matching infinite sets
I'm confused by the confusion here. Set A - a finite set - has some members, Set B - and infinite set - is everything that is NOT in Set A So you match a test item to Set A and if it matches it's a member of A. If it doesn't match Set A it's a member of B. How is this not really simple?
Matching infinite sets
Actually - you can match an infinite set. And maybe this is what it's hard for some people to wrap their head around. Suppose set A contains 2 items, apples and oranges. So we define set B as everything in the universe that is not in set A. So set B is an infinite set, everything in the universe EXCEPT apples and oranges. Our first test set contain an orange - so it matches set A and not set B. Our second test set contains a cherry - so it doesn't match set A but it does match set B. When you have a method that matches against infinite sets to completely changes how you think about spam and ham detection. On 08/16/16 12:57, Shawn Bakhtiar wrote: / / /By they way, you can’t match an infinite set (well theoretically but not actually). / /https://en.wikipedia.org/wiki/Intersection_(set_theory)/ <https://en.wikipedia.org/wiki/Intersection_%28set_theory%29> / / -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: I have some bad news
For what it's worth I have noticed that people who are familiar with Bayesian filtering seem to have a mental block when it comes to understanding this. People who know nothing about bayesian get it instantly. Here's the actual formula. card(Test_message intersect Spam diff Ham) minus card(Test_message intersect Ham diff Spam) On 08/17/16 09:16, Shawn Bakhtiar wrote: On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas <uh...@fantomas.sk <mailto:uh...@fantomas.sk>> wrote: On 16.08.16 20:06, Marc Perkel wrote: What I'm doing is looking for fingerprints in email that intersect HAM and not in SPAM - which would be a HAM result. If it matches SPAM and does NOT match HAM - then it's SPAM. The magic is in the NOT matching on the other side. so, if mail matches both hammy and spammy tokens (or token sets), you don't classify at all? I guess what is confusing me (and I imagine others, as alluded to by Matus) is the fact that you are describing a special condition of Bayes' probability theorem. You are testing two variables (match SPAM and match HAM) (not matching is simply the negation of matching) thus giving you four conditions: 1) SPAM & 2) SPAM &&~HAM 3) ~SPAM & 4) ~SPAM &&~HAM Here is a great diagram to show the four probable conditions: https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg So (if I am correct) Matus is asking what if condition 1 is true? How are you classifying an email than? Which is often the state of most emails, and thus why the use of Naive Bayes spam filtering, which generates a probability based on Bayes' probability theorem and is the conventional methodology to date. A Rose by any other name Condition 4 is obvious it's nothing you have ever seen so classifying it anything other than HAM would be a huge mistake (IMHO), and fully covered by the aforementioned theorem as the probability of SPAM would (should) be 0. Same with Condition 3, obviously it never hits SPAM so wether it matches HAM or not you're going to treat it as HAM anyway same as condition 4. That leaves condition 2. Which (if I'm not mistaken) is "... it matches SPAM and does NOT match HAM - then it's SPAM.". Which brings us back to Matus question, what if the email contains a single HAM token? Two HAM tokens? This is exactly what Bayes' probability theorem is designed for. All you are doing is defining a special condition in which the HAM probability is ZERO. I think that's were I need to understand a bit more about what HAM means in this solution, does getting a hit on HAM somehow negate it being SPAM completely? In other words if the email contains some set of tokens that are SPAM, yet only one HAM token, that single HAM token makes it not SPAM? If so, you have a long way to go in convincing me that this is a good solution. So if I say to you, "Let's get some lunch" that's ham because spammers never say that, but normal people do. So the way to test what "spammers never say" is to store what they do say and see if it's NOT in the list. (Thus the infinite set) Actually I get SPAM with that very set of tokes in it. If somehow the HAM rating of it overrides the SPAM, I don't believe it would have a desirable effect. I get plenty of: " Hay Shawn, Hope you have time to do some lunch, click on this link and check out my new pictures! Wannabe Phisher " Based on your example there's plenty of HAM and SPAM tokens in there, "Click on this link" high probability of SPAM-e-ness, would it get HAMed based on "hope you have time to do lunch". Or am I missing something? Similarly, there's only so many ways to misspell viagra, and good email wouldn't have it spelled wrong. Does that make sense? Again, what you are saying makes sense in that it is special condition of the probability theory, What does not make sense is why would you not simply use the probability theory, that already encompasses that condition? -- Matus UHLAR - fantomas, uh...@fantomas.sk <mailto:uh...@fantomas.sk> ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux - It's now safe to turn on your computer. Linux - Teraz mozete pocitac bez obav zapnut. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: I have some bad news
On 08/17/16 03:51, Antony Stone wrote: On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote: What I'm doing is looking for fingerprints in email that intersect HAM and not in SPAM - which would be a HAM result. If it matches SPAM and does NOT match HAM - then it's SPAM. The magic is in the NOT matching on the other side. So if I say to you, "Let's get some lunch" that's ham because spammers never say that, but normal people do. So the way to test what "spammers never say" is to store what they do say and see if it's NOT in the list. (Thus the infinite set) What length are the tokens you store in the list? Single words (so the above lunch example would contain 4 tokens)? Entire phrases (so the above would be just 1 token)? Also how do you deal with spam which contains random cuttings from legitimate texts (generally along with a graphic attachment and/or a URL to get aross the "real" message)? I tokenize a lot of different things but the fingerprints are at most 3 to 4 tokens long. If you go more then you get a database that's too big. And in the body I'm just looking at the first 50 words, and a "concept parser" that looks at the whole body. http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter Similarly, there's only so many ways to misspell viagra, and good email wouldn't have it spelled wrong. Does this mean that people with bad spelling will more likely get classified as spam, because they do not match the 'ham' group very well? No - unless they misspell a lot of words the same way spammers misspell it. If a spammer isn't misspelling the same way and normal people are - it can count as ham - or be ignored. Also, what happens to mail contains lots of tokens which match neither set (for example, perfectly legitimate email which happens to be in a language the system hasn't been trained with)? Mail that doesn't match either side produces no score. Antony. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: I have some bad news
On 08/17/16 03:43, Matus UHLAR - fantomas wrote: On 16.08.16 20:06, Marc Perkel wrote: What I'm doing is looking for fingerprints in email that intersect HAM and not in SPAM - which would be a HAM result. If it matches SPAM and does NOT match HAM - then it's SPAM. The magic is in the NOT matching on the other side. so, if mail matches both hammy and spammy tokens (or token sets), you don't classify at all? On that fingerprint is it matches both it creates no score on that item. The idea is to generate a lot of fingerprints so that something scores. If you look at enough stuff to generate hundreds of fingerprints and you have big reference corpi then you will usually get a result on something. Usually a big result in one direction. But ignoring if it's in both makes it more immune to poisoning. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: I have some bad news
Hi Shawn, What I'm doing is looking for fingerprints in email that intersect HAM and not in SPAM - which would be a HAM result. If it matches SPAM and does NOT match HAM - then it's SPAM. The magic is in the NOT matching on the other side. So if I say to you, "Let's get some lunch" that's ham because spammers never say that, but normal people do. So the way to test what "spammers never say" is to store what they do say and see if it's NOT in the list. (Thus the infinite set) Similarly, there's only so many ways to misspell viagra, and good email wouldn't have it spelled wrong. Does that make sense? On 08/16/16 12:57, Shawn Bakhtiar wrote: Marc, Let me first say I am truly sorry to here about your cancer. I lost my father to cancer just over a decade ago, after a long battle with sarcoma of the throat and tongue. So I pray and wish you the best. I sent this to you in January 2016 (don't recall if I ever got a reply to it) but based on your document: /Set theory is not my strongest suit, but your diagram looks incorrect:/ /http://www.junkemailfilter.com/patent/patent5.pdf/ / / /Let:/ / / /H be ham / /S be spam / /E be an email/ / / /Than you state that:/ /HE = (H u E)/ /SE = (S u E)/ / / /But than the next diagram shows that there is some solution in which (HE u SE) and thus there may be some set which is (HE / SE). Even though in the first diagram S and H do not intersect./ / / /This is not logical. Either (H u S) in which there are tokens common to the ham and spam token sets, or it does not, so which is it?? in other words, if a token is both ham and spam how are you calculating it’s weight?? Is it spam or ham? / / / /Clearly it’s the latter (they do not intersect) as described in this:/ /http://www.junkemailfilter.com/patent/patent2.pdf/ / / /In which case you are simply looking to see if (H u E) > (S u E) and has nothing to do with what is not in the set, and there is indeed no (H u S) or the negation or NOT which is (H / S), so as everyone has been trying to explain it has NOTHING to do with what is NOT matched./ / / /By they way, you can’t match an infinite set (well theoretically but not actually). / /https://en.wikipedia.org/wiki/Intersection_(set_theory)/ <https://en.wikipedia.org/wiki/Intersection_%28set_theory%29> / / /Since the current Bayes learns both SPAM and HAM I imagine that it does a very similar thing, other than perhaps the larger multi word token sets, which seems a trivial thing to add, and available in other tool sets. / I'll only add this, if you believe that your SPAM has been greatly reduced. That's awesome! But have you really isolated it to this "new technique" or in playing around have you inadvertently changed something else that may have changed your results? I am also not saying that you have not developed some "new technique", but that if you have, your description of it does not line up logically with the technique itself. Back in January you were looking to patent it, today you simply want it to live on. I suggest that if it is indeed the latter, than perhaps it's time to release the source code/scripts and let a few more eyes look at the logic to see exactly what is it doing, that you believe is so different than what is out there. Again, I pray and hope the best for you, Shawn On Aug 16, 2016, at 6:45 AM, Marc Perkel <supp...@junkemailfilter.com <mailto:supp...@junkemailfilter.com>> wrote: Thanks for the encouragement Ted. Unfortunately I know way too much about mathematics and I have a deep understanding of probability spectrums. There's a curve and I'm going to be somewhere on it. If I'm lucky I might be here for some time. But my life is a casino right now. And yes - there is also a probability spectrum for any of us getting hit by a bus tomorrow as well. SpamAssassin is based on statistical probabilities. I have to have a dual track strategy. One one hand I need to do what I can to move the curve into the future. But at the same time I need to accomplish thing that are important within a limited time slot as well. Spam filtering isn't just another job to me. I actually have a passion for it. On a philosophical basis I look at the internet as the new nervous system for humanity and is now core to who we are as a species. And email is a very key technology in that nervous system. In that context spam is like poison where predators suck some of the life out of humanity, and my real life has always been about the progress of the human race. I am somewhat of a spam fighting savant. I actually run very little of my email through SpamAssassin, truth be told. Over the years I've thrown some ideas into the mix and sometimes they have been adopted to make SA better. Sometimes I just get shouted down by trolls and the ideas go no where. At this point however there's a deadline and I have ideas that could be implemen
Re: I have some bad news
On 08/16/16 15:22, Ted Mittelstaedt wrote: I read though the site, and here's why I probably couldn't implement it, at least not as it stands now. SpamAssassin basically depends on a diet of spam to feed the learner. The learner learns what is spam. If you add some ham into the learner it works better - but the main thrust of it is feed me spam feed me spam. Your method depends on a diet of -ham- not spam because you are doing the opposite of SA My problem as an admin is this. I can guarantee that when a customer complains about a piece of junk, that what they give me is junk. But customers don't complain about ham. So I'm not going to see it. And I cannot just iterate through all my customer mailboxes and assume they are all full of ham, because some of my customers are lazy and won't delete spam, or they don't read their mailbox for months at a time, etc. etc. I cannot guarantee I'll get only ham by doing that - and so therfore I don't have a guaranteed source of ham. You said that your existing perl scripts are hacks and ugly. But, I'm wagering that most of your ugly programming is user interface code that somehow coaxes your users to yield up a diet of ham. My problem is there is a tremendous dearth of user interface code out there to get EITHER spam or ham. The closest I have ever found is the mailwatch interface but that is god-awful complex. I have it running on an ISP customer of mine's mailserver but God what a hack. Without that, all I can do is what I do now, which is make sure that all customers accessing my server with IMAP have a junk mail folder and know that if they drag spam into there that I'll suck it into the learner. Of course, POP3 clients have nothing and I cannot tell some POP3 user "Oh if you really want to reduce your spam load then give up your POP3 email client and use this slick webinterface I have setup for you to send and receive email." I'm actually not as interested in your engine as I am in how you get your customers to participate with it because if you have found a way to get 'em to do it, that is truly revolutionary. Mine would rather bitch and moan about spam and when they get it, just delete it - which while it puts it in a deleted folder that I can get at (if they are IMAP) it mixes it up with deleted ham, so I cannot take that mess of mixed unidentified spam and ham and use it for anything. Ted Hi Ted, My system depends on a stream of both ham and spam creating a ham corpus and a spam corpus. I already had many rules in place (Not SA) to identify ham. Actually all you need is my RBL hostkarma.junkemailfilter.com with result 127.0.0.1 and the FcRDNS is good - there's your ham stream. SA has a mindset of detecting spam. You have to change that to detecting spam and ham. Once you have streams going into the learner then you can not only increase spam detection, but you can positively identify good email as good and have almost no false positives. Then the output with strong scores are fed back into the learner where it learns how people who send ham speak and people who send spam speak. And it's very very effective. and I'm just giving it away. Thanks for looking at it though. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: I have some bad news
Thanks for the encouragement Ted. Unfortunately I know way too much about mathematics and I have a deep understanding of probability spectrums. There's a curve and I'm going to be somewhere on it. If I'm lucky I might be here for some time. But my life is a casino right now. And yes - there is also a probability spectrum for any of us getting hit by a bus tomorrow as well. SpamAssassin is based on statistical probabilities. I have to have a dual track strategy. One one hand I need to do what I can to move the curve into the future. But at the same time I need to accomplish thing that are important within a limited time slot as well. Spam filtering isn't just another job to me. I actually have a passion for it. On a philosophical basis I look at the internet as the new nervous system for humanity and is now core to who we are as a species. And email is a very key technology in that nervous system. In that context spam is like poison where predators suck some of the life out of humanity, and my real life has always been about the progress of the human race. I am somewhat of a spam fighting savant. I actually run very little of my email through SpamAssassin, truth be told. Over the years I've thrown some ideas into the mix and sometimes they have been adopted to make SA better. Sometimes I just get shouted down by trolls and the ideas go no where. At this point however there's a deadline and I have ideas that could be implemented in SA very very easily. In fact it was through SA that I discovered Redis, and SA already talks to redis. Although my innovation is excellent as a programmer I'm mediocre. Never worked as a team. Easily frustrated. Probably somewhat autistic and somewhat arrogant. So mostly living in my own world doing my own development. I have my little online empire. I work from home. I make a great living. And I really like (most of) my customers and enjoy doing tech support. And it's allowed me a lot of free time to do things that I'm really interested in. But my ideas are now my immortality, so I'm now releasing this to the world. And mostly this simple AI method that SA could easily implement. This new spam filtering trick is not only extremely effective, it's extremely simple. I had it working in 2 days. The developers here could probably implement it in 1 day. (At least the core functionality) And with a team of better programmers probably do a better job and get a even better result than I get. In fact you don't need or even want my sloppy code (not in Perl). All you need is to read the description of how it works and once you get it - coding it is trivial. So - this is an opportunity to milk the mind of the dying spam savant. It works, it's easy, and I'm just handing it to you all. There is no reason I would be making this up. All you all need to do is accept this gift. On 08/16/16 01:03, Ted Mittelstaedt wrote: Hi Marc, Back in 1994 I was diagnosed with testicular cancer, it was essentially "stage 4" as it had metastasized throughout my body. But, it responded to chemo and here I am today. In fact ironically my original oncologist died a few years ago - on a fishing trip he had an accident and drowned. The Universe has an interesting sense of humor and likes to throw curve balls. Take what you have been told about your "probability spectrum" and toss it in the trash - hakuna matata. You could accidentally step in front of a bus tomorrow and be dead. You could live another 20 years. Statistics on people only have meaning on large groups of people - they are irrelevant when it comes to the individual. I've met a number of people who had serious cancers. And I learned one thing from that. The people who survived - every one of them, fighters. And everyone fights differently. Some get on the food bandwagon and try overdosing on green tea and every alleged anti-cancer food out there. Others jump into yoga, and I knew one guy who went out and binged watched Monty Python to spend as much time laughing as possible. Me, I fought on a more mental approach. I dropped everything in my life that I was not completely satisfied with - I turned my back on my job, my apartment, etc. - every burden or responsibility that I had which I didn't like and didn't really want - and dove into the treatment, and I never let myself believe I was in any danger of dying. Of course, not all who fight, survive. But I will say with absolute conviction that everyone I ever met who had a serious cancer and had that "attitude of acceptance", later died. You are a fighter or you wouldn't even be here. Now, fight to win. Ted -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
I have some bad news
the essence of who I am and what makes my existence have meaning will be preserved. I have always believed that if a person decides to "own their story" and choose to live a life worth living that when they are faced with the end of their personal existence it would be much easier. And now that I am there I can say it is definitely true. I have not lived a perfect life and looking back there are quite a few things where I could have made a better choice. But at this point I'm feeling unusually positive about my situation as my last adventures unfold. While I have spent much of my life writing software for cyberspace I have also written quite a bit of software for meat space. This email is an example of that. Meat space is coded in ideas and philosophies and I'm hoping in the time I have left to see what else I can accomplish. Facing death definitely sharpens the mind so I'm going to take advantage of that. I suppose I'll wrap this up here as I can ramble on forever. And forever isn't as quite long as it used to be. Marc Perkel /root -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: ixhash.junkemailfilter.com seems to be broken currently
On 06/21/16 00:03, Alessio Cecchi wrote: Il 20/06/2016 16:22, Reindl Harald ha scritto: since Marc is present on this list and maybe others using it too: dig A c134389d7cefd3aadce78714669239f2.ixhash.junkemailfilter.com. status: SERVFAIL Query time: 1798 msec so at least for the last 2 days the rule below slows down scanning score JEF_IXHASH1.0 ixhashdnsbl JEF_IXHASH ixhash.junkemailfilter.com. bodyJEF_IXHASH eval:check_ixhash('JEF_IXHASH') describeJEF_IXHASHDIGEST: ixhash.junkemailfilter.com Hi, Marc, some weeks ago, confirmed to me that ixhash.junkemailfilter.com is no more in use. Ciao Yeah - I had problems keeping it stable. But - I'd still like to contribute. If someone wants to help me get it going again or wants a spam feed from me I'll set it up. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Spam Filtering Trick that could be easily adapted to Spam Assassin
Tried to send this into the list but I think it had so many spam phrases it got blocked. So I'll just link to my wiki. http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter This is a spam filtering trick I'm using but it's not SA, but could be easily adapted to SA. Rather that just scan for regex strings it's useful to have a way to tell what things the message is talking about and reduce those to a single token that represents a concept. Then the concepts can be combined to produce rules or fed into Bayes for automatic scoring. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Reporting gmail spam to Google
Is there any address that I can forward gmail spam to google for reporting? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Interesting rule combo results
On 03/09/16 07:33, Dave Funk wrote: On Tue, 8 Mar 2016, Marc Perkel wrote: This is the for what it's worth department. I've generated the following rules combination lists. The ham list are rule combinations sorted by the number of ham hits that have 0 spam hits. The spam list are rule combinations sorted by the number of spam hits that have 0 ham hits. There are some of my personal rules mixed in. Just posting this just to see if anyone sees any value in this. SPAM RULES: 11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP 11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP 11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP 10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP 10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP 5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE 5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE [snip..] HAM RULES: 132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC 132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC 131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC [snip..] 80056 HTML_MESSAGE 78472 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY 77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY 77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY 76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY 72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY 59189 DKIM_SIGNED 56792 DKIM_VALID [snip..] Marc, Maybe I'm misunderstanding your list but it looks like you've got HTML_MESSAGE by itself in the HAM RULES (IE zero spam hits on HTML_MESSAGE) but you've also got a rule combo of HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP as the top SPAM RULES (which implies that there is SPAM that hits HTML_MESSAGE too). Similar situation for DKIM_SIGNED & DKIM_VALID Also how can you have 132983 hits on the combo of DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC but only 59189 hits on DKIM_SIGNED by itself? That's a valid observation. In the learner I'm working on I'm experimenting with and interesting forgetter that wipes out and restarts some of the keys. Part of the process of getting rid of bad data takes some good data with it and usually the good data recovers over time. This is still very experimental. I'm just applying my new filter to just the rule names coming out of SA and completely ignoring the scoring or even if it's a spam or ham rule. I just wanted to see what the result would be. To see if I can generate SA rules from my data. So far - crude at best. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Interesting rule combo results
On 03/09/16 06:45, RW wrote: On Tue, 8 Mar 2016 22:25:09 -0800 Marc Perkel wrote: This is the for what it's worth department. I've generated the following rules combination lists. The ham list are rule combinations sorted by the number of ham hits that have 0 spam hits. The spam list are rule combinations sorted by the number of spam hits that have 0 ham hits. ... ... HAM RULES: ... 80056 HTML_MESSAGE What's happening here? This seems to imply that HTML_MESSAGE only appears in ham. I think my results are a little strange in that I might not be training off all the data but just that which gets past all my other filters. I'm still working on this but thought I'd share what it came up with for better or worse. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Interesting rule combo results
DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY 77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY 77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY 76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY 72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY 59189 DKIM_SIGNED 56792 DKIM_VALID 36441 HTML_MESSAGE MAILTO_LINK 36399 MAILTO_LINK 34960 DKIM_VALID_AU 32155 DKIM_SIGNED FREEMAIL_FROM_END_DIGIT MAILTO_LINK 31739 DKIM_VALID FREEMAIL_FROM_END_DIGIT MAILTO_LINK 31586 DKIM_SIGNED USER_IN_DEF_DKIM_WL 31491 DKIM_VALID USER_IN_DEF_DKIM_WL 31354 DKIM_VALID_AU USER_IN_DEF_DKIM_WL 30286 DKIM_SIGNED FREEMAIL_ENVFROM_END_DIGIT MAILTO_LINK 30191 DKIM_VALID FREEMAIL_ENVFROM_END_DIGIT MAILTO_LINK 29567 HTML_MESSAGE USER_IN_DEF_DKIM_WL 28448 USER_IN_DEF_DKIM_WL 27835 DKIM_SIGNED DKIM_VALID_AU FREEMAIL_FROM MAILTO_LINK 27819 DKIM_VALID DKIM_VALID_AU FREEMAIL_FROM MAILTO_LINK 27497 DKIM_VALID_AU FREEMAIL_FROM_END_DIGIT MAILTO_LINK 27224 DKIM_VALID_AU FREEMAIL_FROM HTML_MESSAGE MAILTO_LINK 27135 DKIM_VALID_AU FREEMAIL_ENVFROM_END_DIGIT MAILTO_LINK 25738 DKIM_SIGNED DKIM_VALID_AU HTML_MESSAGE LOTS_OF_MONEY 25721 DKIM_VALID DKIM_VALID_AU HTML_MESSAGE LOTS_OF_MONEY 24140 HTML_MESSAGE WHILE_SUPPLIES 23120 BANG_MORE 22958 DKIM_SIGNED WHILE_SUPPLIES 22434 DKIM_VALID_AU FREEMAIL_FROM MIME_QP_LONG_LINE 22406 CALL_FREE DKIM_SIGNED DKIM_VALID HTML_MESSAGE 20571 DKIM_SIGNED HTML_MESSAGE MAILTO_LINK RDNS_DYNAMIC 16517 CALL_FREE DKIM_SIGNED DKIM_VALID_AU HTML_MESSAGE 16429 CALL_FREE DKIM_VALID DKIM_VALID_AU HTML_MESSAGE 16263 DKIM_VALID_AU URI_TRY_3LD 16036 DKIM_SIGNED DKIM_VALID USER_IN_DEF_DKIM_WL 15975 DKIM_SIGNED DKIM_VALID_AU USER_IN_DEF_DKIM_WL 15940 DKIM_VALID DKIM_VALID_AU USER_IN_DEF_DKIM_WL 15036 DKIM_VALID_AU HTML_IMAGE_RATIO_02 HTML_MESSAGE MIME_HTML_ONLY 14919 GMD_PDF_SQUARE 14834 DKIM_SIGNED FREEMAIL_FROM FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 14745 DKIM_SIGNED DKIM_VALID FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 14661 DKIM_VALID FREEMAIL_FROM FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 14459 DKIM_SIGNED HTML_MESSAGE USER_IN_DEF_DKIM_WL 14431 DKIM_VALID HTML_MESSAGE USER_IN_DEF_DKIM_WL 14409 DKIM_VALID_AU HTML_MESSAGE USER_IN_DEF_DKIM_WL 14030 DKIM_SIGNED FREEMAIL_ENVFROM_END_DIGIT FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 13632 DKIM_SIGNED HTML_MESSAGE MAILTO_LINK MIME_QP_LONG_LINE 13351 HTML_MESSAGE SUBJ_2_CREDIT 13265 SUBJ_2_CREDIT 13163 DKIM_SIGNED SUBJ_2_CREDIT 13055 DKIM_SIGNED DKIM_VALID MAILTO_LINK MIME_QP_LONG_LINE 13037 DKIM_SIGNED DKIM_VALID_AU FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 13033 DKIM_VALID DKIM_VALID_AU FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 12958 DKIM_VALID_AU FREEMAIL_FROM FREEMAIL_FROM_END_DIGIT HTML_MESSAGE 12879 DKIM_VALID HTML_MESSAGE MAILTO_LINK MIME_QP_LONG_LINE 12514 GMD_PDF_SQUARE HTML_MESSAGE 12238 DKIM_SIGNED HTML_MESSAGE LOTS_OF_MONEY MIME_HTML_ONLY 12220 DKIM_SIGNED DKIM_VALID LOTS_OF_MONEY MIME_HTML_ONLY 12071 MAILTO_LINK WHILE_SUPPLIES 12044 DKIM_VALID HTML_MESSAGE LOTS_OF_MONEY MIME_HTML_ONLY 11997 DKIM_VALID SUBJ_2_CREDIT 11667 DKIM_VALID_AU SUBJ_2_CREDIT 11360 CALL_FREE DKIM_SIGNED DKIM_VALID MAILTO_LINK 11298 DKIM_VALID_AU FREEMAIL_FROM MAILTO_LINK 11270 DKIM_SIGNED HTML_IMAGE_RATIO_04 HTML_MESSAGE MAILTO_LINK 11152 DKIM_SIGNED DKIM_VALID HTML_IMAGE_RATIO_04 MAILTO_LINK 10988 DKIM_VALID RAZOR2_CHECK SUBJ_GROUP 10935 CALL_FREE DKIM_SIGNED HTML_MESSAGE MAILTO_LINK 10825 MIME_HTML_ONLY USER_IN_DEF_DKIM_WL 10809 DKIM_VALID HTML_IMAGE_RATIO_04 HTML_MESSAGE MAILTO_LINK 10790 DKIM_SIGNED MAILTO_LINK SUBJ_ALL_CAPS 10499 CALL_FREE DKIM_VALID HTML_MESSAGE MAILTO_LINK 10247 DKIM_SIGNED DKIM_VALID HTML_MESSAGE SUBJ_GROUP 10223 DKIM_VALID MAILTO_LINK SUBJ_ALL_CAPS 10047 DKIM_SIGNED DKIM_VALID_AU HTML_IMAGE_RATIO_02 MAILTO_LINK 10028 DKIM_VALID DKIM_VALID_AU HTML_IMAGE_RATIO_02 MAILTO_LINK -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Anyone using ASN data
Just wondering if anyone is using ASN information and is so - what are you doing? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: PDF files containing executables?
On 03/03/16 13:27, John Hardin wrote: On Thu, 3 Mar 2016, Dianne Skoll wrote: On Thu, 3 Mar 2016 13:03:44 -0800 Marc Perkel <supp...@junkemailfilter.com> wrote: Thanks for the response. I'm in the spam filtering business and I'm wondering what I can use (from the command line?) to detect if a PDF has any kind of script attached that would be executable. that way I might block based on what's embedded in a PDF. There are tools. Google is your friend. However, many legitimate PDF files contain Javascript snippets. Blocking solely on that basis will lead to many FPs. I'd argue the "legitimate" part of that statement... :) Sounds to me like it should be: block any PDF with javascript/flash/java with whitelisted bypass. What sane MTA accepts bare executable attachments from the Internet at large any more? The same policy should apply to PDFs. If I could detect java or some other executable inside a PDF then the message would have to be white or near white before I allowed it to pass. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: PDF files containing executables?
On 03/03/16 13:15, Dianne Skoll wrote: On Thu, 3 Mar 2016 13:03:44 -0800 Marc Perkel <supp...@junkemailfilter.com> wrote: Thanks for the response. I'm in the spam filtering business and I'm wondering what I can use (from the command line?) to detect if a PDF has any kind of script attached that would be executable. that way I might block based on what's embedded in a PDF. There are tools. Google is your friend. However, many legitimate PDF files contain Javascript snippets. Blocking solely on that basis will lead to many FPs. Regards, Dianne. In that case I'd like to know if there's java in it so that if the message has other risk flags I can block it. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: PDF files containing executables?
On 03/03/16 13:02, David B Funk wrote: On Thu, 3 Mar 2016, Marc Perkel wrote: A customer of mine inquired about executable viruses inside of PDF files. Is that so? And if it is - is there any way of detecting executables inside of PDF? I don't know that PDFs can contain classical ".exe" type executables but they can clearly contain 'active content' (javascript, flash, etc) which can be abused as a malware delivery vehicle. So for practical purposes PDFs can be considered potential virus containers. AV scanners have rules for detecting malware inside PDFs but that's always a catch-up game. Hi David, Is there a way to detect any executable code so that I can just block all PDF files with executables. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: PDF files containing executables?
Hi Kevin, Thanks for the response. I'm in the spam filtering business and I'm wondering what I can use (from the command line?) to detect if a PDF has any kind of script attached that would be executable. that way I might block based on what's embedded in a PDF. On 03/03/16 12:59, Kevin Miller wrote: Not sure about viruses per se, but I know that there have been instances of embedded javascript in .pdf files which have been malicious. Javascript can be turned off in Acrobat preferences. Likely a toggle in other .pdf readers as well. ...Kevin -- Kevin Miller Network/email Administrator, CBJ MIS Dept. 155 South Seward Street Juneau, Alaska 99801 Phone: (907) 586-0242, Fax: (907) 586-4588 Registered Linux User No: 307357 -Original Message- From: Marc Perkel [mailto:supp...@junkemailfilter.com] Sent: Thursday, March 03, 2016 11:26 AM To: users@spamassassin.apache.org Subject: PDF files containing executables? A customer of mine inquired about executable viruses inside of PDF files. Is that so? And if it is - is there any way of detecting executables inside of PDF? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400 -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
PDF files containing executables?
A customer of mine inquired about executable viruses inside of PDF files. Is that so? And if it is - is there any way of detecting executables inside of PDF? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Redis Bayes Expire
On 03/02/16 08:02, Axb wrote: On 03/02/2016 04:52 PM, Marc Perkel wrote: My Redis bayes keeps growing. It acts like it's not expiring like it should. Do I need to do something to force expire? Also - anything ekse I should set? Here's my settings. bayes_sql_dsn server=localhost:6379 use_bayes 1 use_bayes_rules 1 # Your choice if you want to use auto_learn bayes_auto_learn 1 use_learner 1 bayes_learn_to_journal 0 # THIS IS MANDATORY - You do NOT need to run sa-learn to expire tokens # *_ttl below takes care of it. bayes_auto_expire 1 # You will need to changes this according to your need # This replaces sa-learn's sql/file based expire routines. bayes_token_ttl 3d bayes_seen_ttl 1d run "redis-cli info" and see "expired_keys expired_keys:0 This doesn't look right. db0:keys=56725213,expires=3,avg_ttl=257005915 -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Redis Bayes Expire
My Redis bayes keeps growing. It acts like it's not expiring like it should. Do I need to do something to force expire? Also - anything ekse I should set? Here's my settings. bayes_sql_dsn server=localhost:6379 use_bayes 1 use_bayes_rules 1 # Your choice if you want to use auto_learn bayes_auto_learn 1 use_learner 1 bayes_learn_to_journal 0 # THIS IS MANDATORY - You do NOT need to run sa-learn to expire tokens # *_ttl below takes care of it. bayes_auto_expire 1 # You will need to changes this according to your need # This replaces sa-learn's sql/file based expire routines. bayes_token_ttl 3d bayes_seen_ttl 1d -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Error when trying to re-use Bayes database from one server to another
On 02/13/16 00:42, Reindl Harald wrote: Am 13.02.2016 um 02:56 schrieb Marc Perkel: For what it's worth - just used Redis. Redis is the only thing that's worked reliably for me you can't use Redis when it comes to different servers in different networks for different clients BDB works fine and relieable, at least without autolearning and autoexpire and having the bayes-db path read-only for the running spamd with namespaces 0 60388SPAM 0 21651HAM 02510401TOKEN insgesamt 73M -rw--- 1 sa-milt sa-milt 10M 2016-02-13 09:12 bayes_seen -rw--- 1 sa-milt sa-milt 81M 2016-02-13 09:12 bayes_toks I'm filtering 5000 domains using a single redis server and 4 SA servers. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Error when trying to re-use Bayes database from one server to another
On 02/13/16 07:25, Reindl Harald wrote: Am 13.02.2016 um 15:59 schrieb Marc Perkel: On 02/13/16 00:42, Reindl Harald wrote: Am 13.02.2016 um 02:56 schrieb Marc Perkel: For what it's worth - just used Redis. Redis is the only thing that's worked reliably for me you can't use Redis when it comes to different servers in different networks for different clients BDB works fine and relieable, at least without autolearning and autoexpire and having the bayes-db path read-only for the running spamd with namespaces 0 60388SPAM 0 21651HAM 02510401TOKEN insgesamt 73M -rw--- 1 sa-milt sa-milt 10M 2016-02-13 09:12 bayes_seen -rw--- 1 sa-milt sa-milt 81M 2016-02-13 09:12 bayes_toks I'm filtering 5000 domains using a single redis server and 4 SA servers looks like you refused to understand 'different networks' it's fine in your infrastructure but it won't work in the cases we have in real life where another company with independent infrastructure fetchs our bayes in context of a subscription over webservices, move the files in a temp-folder and train own samples before replace the local bayes with the result reason? 2510401 tokens with dump in and dump out is horrible slow when the number of local samples is around 1000 messages versus 82 messages we have feeded since 2014 or would you open your redis server for 3rd parties on the WAN? I'm using SSH tunneling to keep my redis private. Maybe I'm not understanding what you are trying to do? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: URIBL/DNSBL from a database
On 02/12/16 05:39, Alex wrote: Hi, For some time now I've been cycling URLs and IPs through a mariadb database gathered from incoming mail on a honeypot I've created. Surprising how many are received ahead of spamhaus/barracuda. I'm looking for ideas on how to now make this information available to spamassassin on my production system. I'd like to somehow export the IPs, any URLs in the body, and email addresses to spamassassin. Is it possible for spamassassin to query a database directly? I'm familiar with how to create a uridnsbl, but is DNS the best approach here? The info needs to be updated and reloaded rapidly, and not all the info (URLs, emails) are conducive to being in DNS. Is anyone else doing this, and are you just rejecting the IPs at the SMTP level outright? Thanks, Alex Yeah - unless you write your own SA module using DNS is the quick easy solution. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Error when trying to re-use Bayes database from one server to another
Any chance that the parent directory structure doesn't have enough permissions? The error message says it can't access it so there's your clue. Since the files themselves seem to have good permissions I would look at the directories. On 02/12/16 08:29, Sebastian Arcus wrote: As per advice from this list, I have been re-using my bayes databases on several different servers running SA. On one of the servers though, the database is not accepted. I re-transferred them several times over ssh, to make sure they were not corrupted. The database files are in the correct location, with correct permissions and owned by the correct user: # ls -l /var/spool/spamd/bayes/ total 5912 -rw-rw-rw- 1 spamd spamd 1310720 2016-02-09 08:42 bayes_seen -rw-rw-rw- 1 spamd spamd 4739072 2016-02-09 08:43 bayes_toks The version of SA on both the donor and receiving servers is 3.4.1. When I try to learn a new message on the receiving server (where I moved the bayes files), I get the following error: # su - spamd -c "/usr/bin/sa-learn -D --spam /New\ UnansweredSexHookup\ Request.eml" Feb 12 16:20:53.777 [12973] dbg: locker: mode is 438 Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: created /var/spool/spamd/bayes/bayes.lock.mdr-server.mdrinteriors.co.uk.12973 Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: trying to get lock on /var/spool/spamd/bayes/bayes with 0 retries Feb 12 16:20:53.778 [12973] dbg: locker: safe_lock: link to /var/spool/spamd/bayes/bayes.lock: link ok Feb 12 16:20:53.778 [12973] dbg: bayes: tie-ing to DB file R/W /var/spool/spamd/bayes/bayes_toks Feb 12 16:20:53.779 [12973] dbg: bayes: untie-ing DB file toks Feb 12 16:20:53.779 [12973] dbg: locker: safe_unlock: unlink /var/spool/spamd/bayes/bayes.lock bayes: cannot open bayes databases /var/spool/spamd/bayes/bayes_* R/W: tie failed: No such file or directory Learned tokens from 0 message(s) (1 message(s) examined) Feb 12 16:20:53.779 [12973] dbg: plugin: Mail::SpamAssassin::Plugin::Bayes=HASH(0x93106d0) implements 'learner_close', priority 0 ERROR: the Bayes learn function returned an error, please re-run with -D for more information at /usr/bin/sa-learn line 498. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Error when trying to re-use Bayes database from one server to another
For what it's worth - just used Redis. Redis is the only thing that's worked reliably for me.
Question about spam report header
Normally SA creates a header that has a list of the names of rules that matched. It skips the listing of hidden rules that start with __ . Is there a command where I can easily tell SA to include the hidden rules in the report in the headers so I can see all of it? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Question about spam report header
perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/* This converts the rules. I'm doing something very interesting. It's going to take a few days to see if it works. I'm applying the same techniques of my evolution filter to the SA rule names. I extract the names and then run them into a program that create all combinations up to 4 levels and learn those combos as either spam or ham. Then after building a ham and spam corpus sets I take the test message - create set of rule combinations and then do set campares against the to ham and spam sets. What I'm looking for is combos matching ham and NOT matching spam - or - combinations matching spam and NOT matching ham. In theory I should be able to create thousands of combination rules for both ham and spam that all have a very high probably of being accurate. It's just an interesting experiment to see how well it works. Right now I have 151728 ham combination, 113632 spam combinations. Of those only 22933 are in both sets. It's only been learning for one day. I want to see where it is after a week. Buy changing the rules from __ to T_ I exposed a lot more rule names. The way this works is that I don't need to know what rules are ham rules or spam rules in advance. And I don't need to score them. The filter figures it all out on it's own. So the rule names are just information. I think this trick will make SA far more accurate. We'll see. I want to give it till at least Friday for the system to learn. I'm also storing hit counts so that I could pick out maybe the best 1000 rules and publish them. Anyhow - that's what I'm up to and so far results are good. But because it's early in the learning cycle most message are not yet producing significant scores. The ones that are producing scores are making the right call however. On 02/02/16 20:19, Dave Funk wrote: You can do that but it requires editing all your rule files, altho then you see those matches in all your reports. If you just want to test one particular message, just use the -D option to spamassassin and grep for ' got hit: ' Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION ==> got hit: "" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TO_HEADER_EXISTS ==> got hit: "<" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS ==> got hit: "" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 ==> got hit: "negative match" Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 ==> got hit: "negative match" Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM ==> got hit: "" (Yes, Marc, you probably already know this, this is for the other people who might be following this thread ;) On Tue, 2 Feb 2016, Marc Perkel wrote: Never mind I found that if I change __ to T_ that it does what I want. On 02/02/16 18:05, Marc Perkel wrote: On 02/02/16 17:55, Marc Perkel wrote: Normally SA creates a header that has a list of the names of rules that matched. It skips the listing of hidden rules that start with __ . Is there a command where I can easily tell SA to include the hidden rules in the report in the headers so I can see all of it? I'm also - I suppose asking it to list rules that match that produce no scores. body __LATE_RICH_RELATIVE /\blate .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i body __CT_CLICK /\b(click(ing)? (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i body __BENEFICIARY/\bbeneficiary\b/i body __CT_BEGGER /\b(kind assist[ae]nce|feed my family|need (of )?your help|donat(e|ion))\b/i body __CT_CONTACT /\b((contact(?:ing) you|contact (information|me|email|number|us)|your contact))|to (inform|email) you/i body __CT_REPLY_TO_ME /\b(reply to me|please reply|my email address|private email|contact me|prompt response|reply from you|hearing from you|assist me)/i body __CT_DYING /\b(diagnosed with|months to live|dying of|transplant)\b/i body __CT_UNITED_NATIONS /\bUnited Nations?\b/i meta __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND || CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$ meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING meta __CT_FORMFILL_THIS_FORM || FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT meta __CT_CONFIDENTIALCT_PRIVATE_EMAIL || CT_PRIVATE_PHONE || CONFIDE
Re: Question about spam report header
On 02/02/16 17:55, Marc Perkel wrote: Normally SA creates a header that has a list of the names of rules that matched. It skips the listing of hidden rules that start with __ . Is there a command where I can easily tell SA to include the hidden rules in the report in the headers so I can see all of it? I'm also - I suppose asking it to list rules that match that produce no scores. body __LATE_RICH_RELATIVE /\blate .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i body __CT_CLICK /\b(click(ing)? (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i body __BENEFICIARY/\bbeneficiary\b/i body __CT_BEGGER /\b(kind assist[ae]nce|feed my family|need (of )?your help|donat(e|ion))\b/i body __CT_CONTACT /\b((contact(?:ing) you|contact (information|me|email|number|us)|your contact))|to (inform|email) you/i body __CT_REPLY_TO_ME /\b(reply to me|please reply|my email address|private email|contact me|prompt response|reply from you|hearing from you|assist me)/i body __CT_DYING /\b(diagnosed with|months to live|dying of|transplant)\b/i body __CT_UNITED_NATIONS /\bUnited Nations?\b/i meta __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND || CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$ meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING meta __CT_FORMFILL_THIS_FORM || FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT meta __CT_CONFIDENTIALCT_PRIVATE_EMAIL || CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2 meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY || CT_URGENT_RESPOND meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM describe CT_GOD_BENEFICIARY God and Beneficiary score CT_GOD_BENEFICIARY 4 meta CT_GOD_BEGGER__CT_GOD && __CT_BEGGER describe CT_GOD_BEGGERBegging in Religious Language score CT_GOD_BEGGER 3 -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Question about spam report header
Never mind I found that if I change __ to T_ that it does what I want. On 02/02/16 18:05, Marc Perkel wrote: On 02/02/16 17:55, Marc Perkel wrote: Normally SA creates a header that has a list of the names of rules that matched. It skips the listing of hidden rules that start with __ . Is there a command where I can easily tell SA to include the hidden rules in the report in the headers so I can see all of it? I'm also - I suppose asking it to list rules that match that produce no scores. body __LATE_RICH_RELATIVE /\blate .{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i body __CT_CLICK /\b(click(ing)? (here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i body __BENEFICIARY/\bbeneficiary\b/i body __CT_BEGGER /\b(kind assist[ae]nce|feed my family|need (of )?your help|donat(e|ion))\b/i body __CT_CONTACT /\b((contact(?:ing) you|contact (information|me|email|number|us)|your contact))|to (inform|email) you/i body __CT_REPLY_TO_ME /\b(reply to me|please reply|my email address|private email|contact me|prompt response|reply from you|hearing from you|assist me)/i body __CT_DYING /\b(diagnosed with|months to live|dying of|transplant)\b/i body __CT_UNITED_NATIONS /\bUnited Nations?\b/i meta __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND || CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE meta __CT_MONEY CT_TRANSFER_MONEY || CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$ meta __CT_VICTIM __BENEFICIARY || CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING meta __CT_FORMFILL_THIS_FORM || FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT meta __CT_CONFIDENTIALCT_PRIVATE_EMAIL || CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2 meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY || CT_URGENT_RESPOND meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM describe CT_GOD_BENEFICIARY God and Beneficiary score CT_GOD_BENEFICIARY 4 meta CT_GOD_BEGGER__CT_GOD && __CT_BEGGER describe CT_GOD_BEGGERBegging in Religious Language score CT_GOD_BEGGER 3 -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Can your bayes do this?
OK - Just to show you this isn't Bayesian - see if you can do this. Here is a list of 5505874 words and phrases used in the subject line of HAM and never seen in the subject line of SPAM http://www.junkemailfilter.com/data/subject-ham.txt Here is a list of 3494938 words and phrases used in the subject line of SPAM and never seen in the subject line of HAM http://www.junkemailfilter.com/data/subject-spam.txt Hope you understand it now. Not Bayesian -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - REVEALED!
Yes - you missed something. It is about intersecting one corpi and NOT intersecting the other. This is about what doesn't match - not what does. On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? "Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a probability that an email is or is not spam." — https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering "the set of fingerprints of the test message is intersected with the spam and ham corpi creating sub sets of matches. Then you do a set diff both ways (ham - spam) (spam - ham) and whichever side is bigger wins. Generally it will match on only one side or very predominately on one side.” — Marc Perkel You are still looking up words/phrases in a dictionary set, and coming up with a probability factor of which side it falls on (an application of Baye’s theorom). Or did I miss something? On Jan 20, 2016, at 9:17 AM, Wrolf <wr...@wrolf.net <mailto:wr...@wrolf.net>> wrote: Good luck with your patent application, it should be in the infinitely elastic queue right after my perpetual motion machine. Not sure how you will deal with the number of ham tokens in spam messages. Also not sure how much ham will get canned as spam - but then, maybe people shouldn't be sending each other poetry? haiku by email blossoms in my inbox drink morning coffee ;-) Wrolf wr...@wrolf.net <mailto:wr...@wrolf.net> On Wed, Jan 20, 2016 at 11:52 AM, Marc Perkel <supp...@junkemailfilter.com <mailto:supp...@junkemailfilter.com>> wrote: OK - following up on this. I have my provisional patent filed. I'm still doing development to improve it and working on a licensing contract. But the license will be based on the Creative Commons patent with some restrictions added. Basically I want to get a license fee from the big guys and my spam filtering competitors. So unless you are in the spam filtering business or have more than 10,000 email addresses it's not going to cost you anything. I'm going to describe the concept here. I'm not going to share my code because my code is specific to my system and it a combination of bash scripts, redis, pascal, php, and Exim rules. And the open source programmers are likely to implement it better than I have. Basically I'm trying not to put myself out of business and this new method is a bigger breakthrough than Bayesian filtering. Maybe I should call it a new plan for spam? So - I'm just going to introduce the concept right now about how it works. Once you know what I'm doing it should be easy to implement, I had it working in a couple of days and I'm not an outstanding programmer. One thing to keep in mind is this is a paradigm shift. It's not about matching - *it's about NOT matching*. And although it is far better at catching spam, it best feature is actively identifying good email. The secret sauce Suppose I get an email with the subject line "Let's get some lunch". I know it's a good email because spammers never say "Let's go to lunch". In fact there are an infinite number of words and phrases that are used in good email that are never ever used in spam. And if I'm using words and phrases *never used in spam* that are used in ham - it's good email. And similarly - if I'm using words and phrases that are used in spam and *never used in spam* - it's spam. So - how do I get a list of words and phrases never used in spam? I create a list of words and phrases that are used in spam and check to see if it's *not on the list*. What I do is tokenize the spamiest parts of the email, like the subject line, into words and phrases of 1 2 3 and 4 word phrases. the quick brown fox jumps over the lazy dog - becomes "the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox" "brown fox" "quick brown fox" "the quick brown fox" "jumps" "fox jumps" "brown fox jumps" "quick brown fox jumps" "over" "jumps over" "fox jumps over" "brown fox jumps over" "the" "over the" "jumps over the" "fox jumps over the" "lazy" "the lazy" "over the lazy" "jumps over the lazy" "dog" "lazy dog" "the lazy dog" "over the lazy dog" These tokens are learned as ham or spam and added to sets. I'm using Redis to do this because it has extremely fast set operations. I don't know of anything other than Redis that can d
My new method for blocking spam - REVEALED!
OK - following up on this. I have my provisional patent filed. I'm still doing development to improve it and working on a licensing contract. But the license will be based on the Creative Commons patent with some restrictions added. Basically I want to get a license fee from the big guys and my spam filtering competitors. So unless you are in the spam filtering business or have more than 10,000 email addresses it's not going to cost you anything. I'm going to describe the concept here. I'm not going to share my code because my code is specific to my system and it a combination of bash scripts, redis, pascal, php, and Exim rules. And the open source programmers are likely to implement it better than I have. Basically I'm trying not to put myself out of business and this new method is a bigger breakthrough than Bayesian filtering. Maybe I should call it a new plan for spam? So - I'm just going to introduce the concept right now about how it works. Once you know what I'm doing it should be easy to implement, I had it working in a couple of days and I'm not an outstanding programmer. One thing to keep in mind is this is a paradigm shift. It's not about matching - *it's about NOT matching*. And although it is far better at catching spam, it best feature is actively identifying good email. The secret sauce Suppose I get an email with the subject line "Let's get some lunch". I know it's a good email because spammers never say "Let's go to lunch". In fact there are an infinite number of words and phrases that are used in good email that are never ever used in spam. And if I'm using words and phrases *never used in spam* that are used in ham - it's good email. And similarly - if I'm using words and phrases that are used in spam and *never used in spam* - it's spam. So - how do I get a list of words and phrases never used in spam? I create a list of words and phrases that are used in spam and check to see if it's *not on the list*. What I do is tokenize the spamiest parts of the email, like the subject line, into words and phrases of 1 2 3 and 4 word phrases. the quick brown fox jumps over the lazy dog - becomes "the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox" "brown fox" "quick brown fox" "the quick brown fox" "jumps" "fox jumps" "brown fox jumps" "quick brown fox jumps" "over" "jumps over" "fox jumps over" "brown fox jumps over" "the" "over the" "jumps over the" "fox jumps over the" "lazy" "the lazy" "over the lazy" "jumps over the lazy" "dog" "lazy dog" "the lazy dog" "over the lazy dog" These tokens are learned as ham or spam and added to sets. I'm using Redis to do this because it has extremely fast set operations. I don't know of anything other than Redis that can do this. So think about Redis as the way to implement this. A new message comes in. It is tokenized and fingerprinted and hundreds of fingerprints are generated. Then it's all set operations. the set of fingerprints of the test message is intersected with the spam and ham corpi creating sub sets of matches. Then you do a set diff both ways (ham - spam) (spam - ham) and whichever side is bigger wins. Generally it will match on only one side or very predominately on one side. So I'm not just tokenizing the subject. Also the first 25 words of the message, the text of links in the message, The name part of the from address, The header names, the attachment names, the PHP script if there is one, and various behavior characteristics, (slow, no quit, no RDNS, number on mime parts, multiple recipients, etc.) SpamAssassin is all about matching rules. This is all about not matching. Not matching allows you to compare to an infinite set rather than a finite set. So when spammers start misspelling words to not match the rules, my system catches that and makes its own rules. The tricks that spammers use not makes it easier to catch them using this method. I will post a link to a better explanation later when I write one. But wanted to let you all know this wasn't just a tease from some crazy person. So - here's what I want to see happen. I'd like to see SA implement this. I will provide a license to include with it giving most people a free license. sort of like how Spamhaus isn't free to everyone, but it's in SA. Then the new method will take off and eventually I'll get a little something for this. This new method (I'm calling it the Evolution Spam Filter because the algorithm mimics evolution.) it doesn't just block spammers, it decimates spammers. It's not just a treatment - it's the cure. I hate spam and although I could have kept this secret and made money having the best spam filter on the planet, I decided I had a moral obligation to make this generally available. I think this will save the global economy billions of dollars in recovered productivity and crime and fraud prevention. I'm seeing close to 100% accuracy. It is so accurate it's scary
Re: My new method for blocking spam - REVEALED!
On 01/20/16 10:36, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: . So it still needs to be trained, at least initially, with a manually-vetted corpus. If not, how do you propose to do the initial classification of messages for training? Do you envision it being self-training past that point? What if it goes off the rails? How would you keep it from going off the rails? If it's not self-training then you have the same issues with the reliability of the people feeding the training corpus. On my system I have a long list of good email sources that are 100% white listed and I also have hackerbot traps that are 100% spam. I use these for training to keep it on the rails. Good question though. So I'm not just tokenizing the subject. Also the first 25 words of the message OK, good. I was thinking it would be *really* sensitive to "bayes poisoning". Ignoring all but the first part of the body helps. I assume you're only considering the portion that would render as visible to the recipient. Of course, that brings in all the logic regarding "what is visible to the recipient?" and all the HTML obfuscation we're already seeing to get around Bayes and "only scan the first part of the message". Actually it's very insensitive to poisoning. Yes a spammer might cancel out some good phrases every now and then but since my system does NOT matching on one side it's not as sensitive as Bayes. If they poison with the same phrases twice I have them. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
More details on my evolution filter patent
Here's the details of how the filtering system is structured. This is what I filed: http://www.junkemailfilter.com/patent/patent.pdf Drawings with it. http://www.junkemailfilter.com/patent/patent1.pdf http://www.junkemailfilter.com/patent/patent2.pdf http://www.junkemailfilter.com/patent/patent3.pdf http://www.junkemailfilter.com/patent/patent4.pdf http://www.junkemailfilter.com/patent/patent5.pdf This will be my licensing template: https://wiki.creativecommons.org/wiki/Model_Patent_License -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - REVEALED!
On 01/20/16 11:25, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:44, Antony Stone wrote: How do you identify "the spammiest parts" of an email? The Subject line - the first few words of the email. the header structure, behavior. File extensions of attached files. Are you getting .zip/.rar/etc archive directory listings for that, too? I recommend you do so, that would help trap malware. I throw the extensions into the mix as "information". The filter makes the associations. If there are file extensions that spammers never use then the message is ham. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: More details on my evolution filter patent
On 01/20/16 11:45, Axb wrote: On 01/20/2016 08:24 PM, Marc Perkel wrote: Here's the details of how the filtering system is structured. This is what I filed: http://www.junkemailfilter.com/patent/patent.pdf Drawings with it. http://www.junkemailfilter.com/patent/patent1.pdf http://www.junkemailfilter.com/patent/patent2.pdf http://www.junkemailfilter.com/patent/patent3.pdf http://www.junkemailfilter.com/patent/patent4.pdf http://www.junkemailfilter.com/patent/patent5.pdf This will be my licensing template: https://wiki.creativecommons.org/wiki/Model_Patent_License And what does all this FUSSP have to do with SA? It could probably be implemented in SA with little effort. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
My new method for blocking spam - example
Let me give you an example. Here's 2 subject lines. Easy to guess which one is spam. "Meet horny Russian Brides online!" "I read an article about Russian brides in a magazine." Bayes or spam assassin would look at "Russian Brides" and 499 out of 500 times it's spam. Therefore the nonspam version scores spam points. In my system "Russian brides" is neutral because it is used in both spam and ham. But on the spam side, phrases used in other spam *not matched* in ham. Meet horny horny Russian horny Russian brides brides online! online! On the ham side, phrases used in ham *not matched* in spam. I read an article read an article an article about brides in a magazine in a magazine My filter gets both correctly because of NOT matching. Not matching is a comparison to an infinite set.
Re: My new method for blocking spam - REVEALED!
On 01/20/16 10:44, Antony Stone wrote: On Wednesday 20 January 2016 at 17:52:05, Marc Perkel wrote: Suppose I get an email with the subject line "Let's get some lunch". I know it's a good email because spammers never say "Let's go to lunch". In fact there are an infinite number of words and phrases that are used in good email that are never ever used in spam. Surely this is going to change as soon as enough people implement your filtering system - spammers will use legitimate phrases from ham, both in the subject line and the body of their emails, and thereby get classified as ham? Matching ham doesn't get you classified as ham. it's in not matching spam. Matching ham is neutral if spammers use it too. At some point the spammer wants you to do something and if they immitate ham perfectly then they don't have a message and it's no longer spam. (Except I tokenize behavior as well) So, you're identifying ham by checking that it does not contain words or phrases which you have previously seen in spam... Sounds very much like Bayes to me. Bayes compares the new email to whats inside the ham and spam boxes. What I do is compare inside on one side and outside the box on the other. Bayes is about matching The Evolution filter is about NOT matching. What I do is tokenize the spamiest parts of the email, like the subject line How do you identify "the spammiest parts" of an email? The Subject line - the first few words of the email. the header structure, behavior. File extensions of attached files. Name part of from address, text inside links. I'd like to see SA implement this. I'm not going to share my code because my code is specific to my system and it a combination of bash scripts, redis, pascal, php, and Exim rules. And the open source programmers are likely to implement it better than I have. Given that you have *some* source code, no matter how bad / buggy / specific it is, I think you'll get much greater take-up (and also comprehension of exactly what your technique is) if you at least publish that and invite people to improve on it, rather than say "here's a method idea - you guys code it". The heart of the code is what I do with Redis. It's just set operations. Intersect Ham diff Spam to get ham matches. Intersect Spam diff Ham to get spam matches. Count the lines - Subtract the result - and you have a score. I'm seeing close to 100% accuracy. 1. How close? Less than 10 a day filtering 5000 domains. 2. On what volume of email? 1.3 million good emails last week. 3. What proportion of spam / ham? About 10 spam to one ham. But I have a spam baiting system so I get more spam than normal. 4. What % false positives / negatives? Especially good at identifying ham. 5. How many different domains' email are you feeding in to it? 6. How long have you been testing it (ie: how much have you seen of how it adapts to new spam patterns)? About 4 weeks now. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - REVEALED!
On 01/20/16 11:32, Reindl Harald wrote: Am 20.01.2016 um 20:27 schrieb Marc Perkel: On 01/20/16 11:25, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: On 01/20/16 10:44, Antony Stone wrote: How do you identify "the spammiest parts" of an email? The Subject line - the first few words of the email. the header structure, behavior. File extensions of attached files. Are you getting .zip/.rar/etc archive directory listings for that, too? I recommend you do so, that would help trap malware. I throw the extensions into the mix as "information". The filter makes the associations. If there are file extensions that spammers never use then the message is ham. don't get me wrong but "file extensions that spammers never use" does not happen given that useable file-extensions are limited easily to trick out by use attachments with random extensions not part of the payload all what i read here is "i have re-invited bayes" but call it different Bayes is about matching. My Evolution filter is about NOT matching. It's the*NOT matching* that makes it different. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
The difference between my Evolution filter and Bayes is ...
Bayes compares the test message to what's in the Ham corpus and what's in the Spam corpus and comes up with a number indicating it's more like one or the other. Evolution matched the Ham corpus and not matches the spam corpus to get a ham score. Then it matches the spam corpus and not matches the ham to get a spam score. Usually the results are all on one side or the other, or at least very predominately one side. Sometime though I get no result. Either it matches both or neither. But I also look at hearers and behavior and that usually gives results too. I'm tracking 8 attributes now. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - example
On 01/20/16 11:50, Reindl Harald wrote: Am 20.01.2016 um 20:46 schrieb Marc Perkel: Let me give you an example. Here's 2 subject lines. Easy to guess which one is spam. "Meet horny Russian Brides online!" "I read an article about Russian brides in a magazine." Bayes or spam assassin would look at "Russian Brides" and 499 out of 500 times it's spam. Therefore the nonspam version scores spam points. In my system "Russian brides" is neutral because it is used in both spam and ham. But on the spam side, phrases used in other spam *not matched* in ham that is *exactly* how bayes works and subject alone is *not* they key tokenizing the *whole* message with enough spam *and* ham samples is the key - so there are two options: * you re-invited bayes with a different name * you modified bayes with some tricks and hope spammers would not adopt them anyways, i doubt there is a sane reason for a patent because the principles are just prior art -> bayes Again - Bayes compares what matches. My filter compares what doesn't match. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - REVEALED!
On 01/20/16 12:05, RW wrote: On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? On Wed, 20 Jan 2016 10:52:58 -0800 Marc Perkel wrote: Yes - you missed something. It is about intersecting one corpi and NOT intersecting the other. This is about what doesn't match - not what does. What you are doing is a special case of an ordinary Bayesian filter. If you remove Robinson's correction for low-count tokens, or adjust the Robinson parameters so it has no effect, you end up with tokens that only occur in spam having a probability of 1, tokens that only occur in ham having a probability of 0 and token that occur in both having a probability in-between. If set a cut-off of 0.49... you leave only the pure tokens behind. And because all the probabilities are 0 or 1 the chi-squared test reduces to comparing the number of spammy and hammy tokens just as you are doing. Your multi-word tokenization is exactly the same as in Bogofilter and most of what you are doing can be done in Bogofilter with a few lines in the configuration file. Any value in your scheme must be in the selection of what you tokenize. The rest is likely holding it back. Again - it's not about matching as Bayes does. It's about not matching. In the subject line of the message the phrase "method for blocking spam" makes the message ham. Spammers never use the phrase "method for blocking spam". No other tests needed. My system result 100% ham. To bayes it's just some words. What makes it ham is what doesn't match, not what does. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - REVEALED!
On 01/20/16 12:14, Reindl Harald wrote: Am 20.01.2016 um 21:11 schrieb Marc Perkel: On 01/20/16 12:05, RW wrote: On 01/20/16 10:26, Shawn Bakhtiar wrote: Sorry.. how is this different than Naive Bayes filtering?? On Wed, 20 Jan 2016 10:52:58 -0800 Marc Perkel wrote: Yes - you missed something. It is about intersecting one corpi and NOT intersecting the other. This is about what doesn't match - not what does. What you are doing is a special case of an ordinary Bayesian filter. If you remove Robinson's correction for low-count tokens, or adjust the Robinson parameters so it has no effect, you end up with tokens that only occur in spam having a probability of 1, tokens that only occur in ham having a probability of 0 and token that occur in both having a probability in-between. If set a cut-off of 0.49... you leave only the pure tokens behind. And because all the probabilities are 0 or 1 the chi-squared test reduces to comparing the number of spammy and hammy tokens just as you are doing. Your multi-word tokenization is exactly the same as in Bogofilter and most of what you are doing can be done in Bogofilter with a few lines in the configuration file. Any value in your scheme must be in the selection of what you tokenize. The rest is likely holding it back. Again - it's not about matching as Bayes does. It's about not matching. In the subject line of the message the phrase "method for blocking spam" makes the message ham. Spammers never use the phrase "method for blocking spam". No other tests needed. My system result 100% ham. To bayes it's just some words What makes it ham is what doesn't match, not what does "Spammers never use the phrase" is pure bullshit - sorry, no way to express it nicer! The way I know what spammers never use is I store what spammers do use and see if it doesn't match. I've processed more that 100 million spams and it's amazing how many common words and phrases that spammers never use. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: My new method for blocking spam - REVEALED!
It could be challenging if someone impersonated a bank and they did it right. I'm looking at more aspects than just the content of the message but that's an area where there is some possible weakness. There are other tricks to address the specifically. And I am looking at behavior and headers as well. To be a little clearer. This new system isn't perfect. And it's main strength is identifying good email. It does catch a lot more spam for sure but when people scream at me it's because I blocked something important. So think of this more as detecting ham as it's big feature. On 01/20/16 14:37, jdow wrote: And just how well does this work against spearfishing? And would the same magic list work for ma and pa Kettle well into their 80s only receiving emails from their children and Freddie Burfle with his heads buried in a corporate accounts payable office? {^_^}
Re: Another way to use my filter in SA
On 01/20/16 14:28, John Hardin wrote: On Wed, 20 Jan 2016, Marc Perkel wrote: Here's another way to use my evolution filtering idea with SA. Get rid of all the rule scores and just make a list of the rule names. From the rule names generate all combinations of those rule names up to 4 rule names in a fingerprint and learn those fingerprints as either ham or spam. sort of like this: “A” “AB” “B” “C” “AC” “ABC” “BC” “D” “AD” “ABD” “BD” “CD” “ACD” “ABCD” “BCD” “E” “AE” “BE” “CE” “ACE” “BCE” “DE” “ADE” “ABDE” “BDE” “CDE” “ACDE” “ABCDE” “BCDE” Then - when a new message comes in you make the same combo of fingerprints from the rule names and then use my formula. card(Test intersect Spam diff Ham) - card(Test Intersect Ham diff Spam) Positive result = spam Negative result = ham Unfortunately this also requires training. It would render SA a product that does not work out-of-the-box. Actually it could include a pretrained corpus on the rules at least to get people started. Could also have someone (like me?) provide it as a service that SA would talk to. SA would send the tokens to the service and the service would return a score. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Another way to use my filter in SA
Here's another way to use my evolution filtering idea with SA. Get rid of all the rule scores and just make a list of the rule names. From the rule names generate all combinations of those rule names up to 4 rule names in a fingerprint and learn those fingerprints as either ham or spam. sort of like this: “A” “AB” “B” “C” “AC” “ABC” “BC” “D” “AD” “ABD” “BD” “CD” “ACD” “ABCD” “BCD” “E” “AE” “BE” “CE” “ACE” “BCE” “DE” “ADE” “ABDE” “BDE” “CDE” “ACDE” “ABCDE” “BCDE” Then - when a new message comes in you make the same combo of fingerprints from the rule names and then use my formula. card(Test intersect Spam diff Ham) - card(Test Intersect Ham diff Spam) Positive result = spam Negative result = ham
Re: I have developed a new method of blocking spam that's a game changer
Actually the reason I filed the provisional patent is to start talking about it. As long as I file for a real patent in the next year I'm good. On 01/14/16 03:49, Dianne Skoll wrote: On Wed, 13 Jan 2016 18:01:09 -0800 Marc Perkel <supp...@junkemailfilter.com> wrote: When I reveal it I can explain the basic concept in about 2 paragraphs. The core idea is amazingly simple. OK. What you need to do next is stop talking about it. :) If you disclose details, you risk ineligibility for patent protection, depending on how much time elapses between disclosure and the patent application. Next up, you need to get a patent lawyer. And plan on spending about $20,000 to do everything you need to get a patent. You need to have someone do a patent search and offer an opinion as to patentability. Then you need to pay the attorney fees, filing fees, etc. Good luck! Regards, Dianne. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: I have developed a new method of blocking spam that's a game changer
Nope - that's not it. When I reveal it I can explain the basic concept in about 2 paragraphs. The core idea is amazingly simple. On 01/13/16 17:52, Dianne Skoll wrote: Well... You're light on details, but from the few clues you've given, is it possible you've (re-)invented a genetic algorithm for spam classification? http://ieeexplore.ieee.org/xpl/login.jsp?tp==5982390=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5982390 http://teaching.csse.uwa.edu.au/year4/Current/Students/Files/2007/JamesDudley/CorrectedDissertation.pdf There are quite a few research papers on this. Anyway, I can't test out your method, but I'll sure keep an eye on how things evolve. :) Regards, Dianne. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
I have developed a new method of blocking spam that's a game changer
OK - this might sound a little unbelievable but I'm not making this up. I want to introduce this because I'm hoping to release this soon and I want to create some buzz and anticipation. I'm not going to talk about the details yet but I hope to soon. I just filed a provisional method patent on the method and tomorrow I'm going to be talking to some investor types about it. I'm also working on improving the methods I'm using, but this new trick is so accurate that 1 month ago if someone asked me if this level of accuracy was possible, I would have said - no way! I'm calling it the Evolution Filter. The name is somewhat of a clue to how it works. I'm seeing levels of accuracy getting really close to 100%. And it's especially good at actively detecting good email so false positives are almost not existent. I've been filtering spam now for 15 years and been on this list for about that long and I'm not the kind of guy to just make this stuff up. My intent right now is to just get enough IP protection so I can get a license fee from the big corps. I plane on giving it away free to the little guys. So that if you have less that 10,000 email accounts it's free. Hoping to get like 1 cent per email account per year from the big guys. Although this idea is very unique, it's actually rather simple to implement. I'm using Redis and since SA is also using redis it should be trivial to add it to SA. My programming skills are good but not great. So the developers here should be able to do a significantly better job than me. It only took me an afternoon to implement the concept and it was already impressive with just 3 hours of learning. This is not Bayesian or remotely similar to Bayesian. It does use a DB like Bayesian does and there is learning involved. But it's probably 100x better at detecting spam and 1000x better at detecting good email. My plan is that this technique is going to be so good that everyone is going to immediately implement it. And because of that the big boys will license it from me. The accuracy is so good that it could put many spammers out of business. It can recognize spam more accurately that I can by hand looking at someone elses email. If someone on this list wants to verify that I'm not just smoking the wrong kind of cigarettes I'm willing to let people test it on the condition that you report back here and tell everyone what your experience is. If anyone has some feedback about how I can make this available to everyone and make a little something in licensing fees I'm definitely listening. I do want to release this to you all soon because you'll probably make it better than I have. I have a little more info on Dvorak's blog. http://www.dvorak.org/blog/2016/01/12/i-invented-a-new-way-to-filter-spam-thinking-about-a-patent/ -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Looking for a script to extract readable text from emails
I'm looking for a script to extract readable text from emails. I want it demimed, ignore html, images, etc. What I'm looking for is just the readable text (real words). Mostly just need to extract about the first 200 characters of real text. Can someone point me in the right direction? Thanks in advance. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
This Bayes Redis works GREAT. For years I've been trying to get bayes to work and now finally IT WORKS -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: A Plan to Stop Violence on Social Media
Probably yes. But talk about opening a can of worms. If you can detect ISIS you can detect anything. On 12/15/15 20:19, Wrolf wrote: Stop me if you've heard this one. Would it be practical to use the Spamassassin techniques of Bayesian filtering and RBL lists to block ISIS on social media? Wrolf wr...@wrolf.net <mailto:wr...@wrolf.net> -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Ignoring internal networks
I have situations where I need to run SA on a message that comes from another server. But the server it's coming from is forwarding the message and I want that server to be ignored (not scored white) so that SA can see the message as if it came to me directly. I bet there's a command to do that. I just can't find it. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Ignoring internal networks
On 12/14/15 22:49, Benny Pedersen wrote: On December 15, 2015 6:18:06 AM Marc Perkel <supp...@junkemailfilter.com> wrote: I bet there's a command to do that. I just can't find it. perldoc Mail::SpamAssassin:::Config see *networks I figured it out. Thanks. internal_networks -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
OT - Has anyone tried RSPAMD?
And if you have - is it any good? Or am I wasting my time with it? Thanks in advance. I know it's off topic. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
On 12/12/15 08:47, Reindl Harald wrote: Am 12.12.2015 um 17:44 schrieb Marc Perkel: On 12/12/15 02:38, Axb wrote: On 12/12/2015 12:28 AM, Marc Perkel wrote: redis_version:2.4.10 You're using an ancient Redis version... SA makes use of LUA support which was added in 2.6.0 You definitely need to upgrde to 3.x and you'll probably need to nuke your DB dump before the upgrade... I'm running Centos 6. I just downloaded and compiled the latest. Can I just copy over the executables and keep my data or do I need to start over? if it works with the old data you can keep it, otherwise as said "probably need to nuke your DB dump" but don't do make && make install on producton machines sooner or later your build will get overwritten by a yum-update with the old version + patches because rpm don't know anything about your override https://wiki.centos.org/HowTos/SetupRpmBuildEnvironment Worked with the old data. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
On 12/12/15 02:38, Axb wrote: On 12/12/2015 12:28 AM, Marc Perkel wrote: redis_version:2.4.10 You're using an ancient Redis version... SA makes use of LUA support which was added in 2.6.0 You definitely need to upgrde to 3.x and you'll probably need to nuke your DB dump before the upgrade... What does LUA mean? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
On 12/12/15 02:38, Axb wrote: On 12/12/2015 12:28 AM, Marc Perkel wrote: redis_version:2.4.10 You're using an ancient Redis version... SA makes use of LUA support which was added in 2.6.0 You definitely need to upgrde to 3.x and you'll probably need to nuke your DB dump before the upgrade... I'm running Centos 6. I just downloaded and compiled the latest. Can I just copy over the executables and keep my data or do I need to start over? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
On 12/12/15 09:36, Martin Gregorie wrote: On Sat, 2015-12-12 at 08:49 -0800, Marc Perkel wrote: I can put redis in thew yum exception list Or, you can install your self-built version in /usr/local/bin and adjust $PATH so it preceeds /usr and /usr/bin. This will protect your version from yum or dnf updates. Then keep an eye on subsequent updates and remove your version if/when yum or dnf delivers a more recent version. Martin Actually I did rpm -e --justdb to take it out so it doesn't get updated. BTW - this is working GREAT. I'm thinking about learning redis and doing interesting things with it. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
On 12/12/15 08:47, Reindl Harald wrote: Am 12.12.2015 um 17:44 schrieb Marc Perkel: On 12/12/15 02:38, Axb wrote: On 12/12/2015 12:28 AM, Marc Perkel wrote: redis_version:2.4.10 You're using an ancient Redis version... SA makes use of LUA support which was added in 2.6.0 You definitely need to upgrde to 3.x and you'll probably need to nuke your DB dump before the upgrade... I'm running Centos 6. I just downloaded and compiled the latest. Can I just copy over the executables and keep my data or do I need to start over? if it works with the old data you can keep it, otherwise as said "probably need to nuke your DB dump" but don't do make && make install on producton machines sooner or later your build will get overwritten by a yum-update with the old version + patches because rpm don't know anything about your override https://wiki.centos.org/HowTos/SetupRpmBuildEnvironment I can put redis in thew yum exception list -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 69569 0 non-token data: nspam 0.000 0 88747 0 non-token data: nham 0.000 0 0 0 non-token data: ntokens 0.000 0 0 0 non-token data: oldest atime 0.000 0 0 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count # Server redis_version:3.0.5 redis_git_sha1: redis_git_dirty:0 redis_build_id:a0e516305b2572d8 redis_mode:standalone os:Linux 2.6.32-042stab112.15 x86_64 arch_bits:64 multiplexing_api:epoll gcc_version:4.4.7 process_id:2085 run_id:f2436709b788f7a9b6a043f6ed90a49512c049d2 tcp_port:6379 uptime_in_seconds:685 uptime_in_days:0 hz:10 lru_clock:7100234 config_file:/etc/redis.conf # Clients connected_clients:550 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 # Memory used_memory:905452784 used_memory_human:863.51M used_memory_rss:924852224 used_memory_peak:964409608 used_memory_peak_human:919.73M used_memory_lua:177152 mem_fragmentation_ratio:1.02 mem_allocator:jemalloc-3.6.0 # Persistence loading:0 rdb_changes_since_last_save:98170 rdb_bgsave_in_progress:1 rdb_last_save_time:1449940748 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:7 rdb_current_bgsave_time_sec:1 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok # Stats total_connections_received:3806 total_commands_processed:1302390 instantaneous_ops_per_sec:2133 total_net_input_bytes:52637812 total_net_output_bytes:15806201 instantaneous_input_kbps:80.48 instantaneous_output_kbps:23.47 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 evicted_keys:0 keyspace_hits:194094 keyspace_misses:65528 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:44672 migrate_cached_sockets:0 # Replication role:master connected_slaves:0 master_repl_offset:0 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0 # CPU used_cpu_sys:5.97 used_cpu_user:34.09 used_cpu_sys_children:4.85 used_cpu_user_children:57.29 # Cluster cluster_enabled:0 # Keyspace db0:keys=3,expires=0,avg_ttl=0 db2:keys=9277218,expires=4,avg_ttl=2546306553
Re: Trying Bayes / Redis
On 12/12/15 15:28, Benny Pedersen wrote: On December 12, 2015 5:49:27 PM Marc Perkel <supp...@junkemailfilter.com> wrote: I can put redis in thew yum exception list so much precompiled problems, why not upgrade to centos 7 ? Because I'd have to upgrade 50 servers for consistency and if I do that I'll probably try something other than centos. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying to understand how bayes works.
On 12/11/15 06:58, RW wrote: On Thu, 10 Dec 2015 13:54:05 -0800 Marc Perkel wrote: Bayes breaks the message down into some sort of tokens and then does statistics on those tokens as to tokens found in spam vs. tokens found in ham. But what about combinations of tokens? I'm thinking that I'd like to have something that says when it sees tokens X and Y and Z then that's spam even though X,Y,Z might be in ham when not combined. Does bayes do that or is there anything that does? In general making arbitrary combinations is not practical. What some filters do is make tokens out of word combinations in a sliding window. This can be very useful in catching difficult spams that are composed of common neutral words, although in my experience it's a little more prone to FPs than Bayes. I use Bogofilter and DSPAM. On Thu, 10 Dec 2015 21:28:44 -0800 Marc Perkel wrote: I'm thinking about incorporating Bogofilter but instead of feeding it messages I'm thinking about feeding it the SpamAssassin results - the rule names it hit + other data about the message and then let it score the rules. That's what I want to experiment with. I thought of trying something like that myself, but my filtering became practically perfect before I got around to it, so I never bothered. And I think there are some problems with it. The first is that FNs in SpamAssassin tend to come from a lack of useful information rather than the scoring system failing to combine it well. The second is that most rules are either fairly neutral or strongly spammy. There are few strong ham indicators to balance the rest. You might be able to balance it with metadata, and reputation information, but the trick is to do it without getting a high FP rate on new senders. If you did wish to take account of rule combinations, you'd really have to do it yourself because sliding-window tokenization wouldn't do it well. What I was thinking about doing was creating a string of tokens that represented key features of the message. Then run that through a program that created new tokens out of every possible combination of 2 tokens and adding that to the string. Then running bayes on that. My tokens will not be the text of the message but rules hit including a lot of rules I create not for points but just for tokens. For example. I create rules that look for many phrases about a subject and the subject becomes a token. For examples: JESUS ROYALTY MONEY But themselves not an indicator of spam. But if you have all 3 then it's definitely spam. The idea is to not look at words but look at the meaning of phrases. For instance, introductions: Dear (friend) I am (someone) I am contacting you because (some reason) This says - I don't know you. I am a member of the (Nigerian royal family|Armed forces in Iraq) etc. These can all be reduced to tokens and then you just look for combination of tokens. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayes / Redis
So far so good. Thanks for your help. It looks like it's actually working. Have about 10k spam/ham tokens so far.
Re: Trying Bayed / Redis
On 12/11/15 13:07, Axb wrote: On 12/11/2015 09:57 PM, Marc Perkel wrote: Trying to set up Bayes using redis. I created a server for redis and have it running. Can't finds the docs on how to create the databases initially and such. And any tips appreciated. Thanks in advance. Did you look into the docs I put in https://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/ ? You don't have to "create" the DB - just look at the redis.conf sample... The moment the redis server gets data, it will "autocreate" the first database, which is the one you'll be using for Bayes. Those little bits should get you going. Definitely keep an eye on memory usage * 2 by tuning bayes_token_ttl 30d bayes_seen_ttl 14d you can "tune usage" IMPORTANT: If you want to do backup dumps of the DB you'll need twice the amount of memory. Redis starts a second server instance to do the dump which uses +- the same amount of memory. So if you keep 4GB of Bayes in memory, you should keep a bit more that 4GB free for the dump. IF the server starts swapping during the dump you're shot. You'll definitely want to do the dumps so BAyes survives reboots after a a kernel update, for example. During the reboot, the BAyes plugin will just timeout and log that it can't connect to the servers. Looks worse thanit really is. When teh server is available it reconnects automatically - no need for a spamd reload/restart. Enjoy OK - I didn't think it was going to be that easy. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying Bayed / Redis
Anyone using this rule timing plugin? Having trouble getting it to work. Just wondering if it's worth it? Mail::SpamAssassin::Plugin::RuleTimingRedis
Re: Trying Bayes / Redis
On 12/11/15 14:59, Axb wrote: On 12/11/2015 11:55 PM, Marc Perkel wrote: So far so good. Thanks for your help. It looks like it's actually working. Have about 10k spam/ham tokens so far. sa-learn --dump magic should show if it's "alive" For safety, don't take your eyes off memory usage and play with: bayes_token_ttl bayes_seen_ttl I've set bayes_seen_ttl 1 Curious what your "redis-cli info" looks like after a while of running... adn if you're happy with speed and results... keep us posted... Axb 0.000 0 3 0 non-token data: bayes db version 0.000 0 14814 0 non-token data: nspam 0.000 0 11933 0 non-token data: nham 0.000 0 0 0 non-token data: ntokens 0.000 0 0 0 non-token data: oldest atime 0.000 0 0 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count redis_version:2.4.10 redis_git_sha1: redis_git_dirty:0 arch_bits:64 multiplexing_api:epoll gcc_version:4.4.6 process_id:472 uptime_in_seconds:4846 uptime_in_days:0 lru_clock:284149 used_cpu_sys:29.75 used_cpu_user:91.39 used_cpu_sys_children:4.31 used_cpu_user_children:49.34 connected_clients:671 connected_slaves:0 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 used_memory:185037872 used_memory_human:176.47M used_memory_rss:182960128 used_memory_peak:193339376 used_memory_peak_human:184.38M mem_fragmentation_ratio:0.99 mem_allocator:jemalloc-2.2.5 loading:0 aof_enabled:0 changes_since_last_save:42332 bgsave_in_progress:0 last_save_time:1449876339 bgrewriteaof_in_progress:0 total_connections_received:110460 total_commands_processed:9275568 expired_keys:0 evicted_keys:0 keyspace_hits:1821764 keyspace_misses:793471 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:16204 vm_enabled:0 role:master db2:keys=2012232,expires=4 -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Trying Bayed / Redis
Trying to set up Bayes using redis. I created a server for redis and have it running. Can't finds the docs on how to create the databases initially and such. And any tips appreciated. Thanks in advance. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Try my IXHASH
On 12/10/15 10:58, Bill Cole wrote: On 10 Dec 2015, at 13:25, Paul Stead wrote: On 10/12/15 18:23, Paul Stead wrote: On 10/12/15 17:24, Bill Cole wrote: On 10 Dec 2015, at 10:48, Paul Stead wrote: 0.004% hit rate on ham Clarify this please: 4 out of 100k hits are ham (not so bad) OR 4 out of 100k hams get hit (OUCH) The former, 4 out of 100k hit are ham emails Re-clarifying - out of 100k ham emails, 4 of these hit on this iXhash So: unfit for a high score (e.g. the suggested 5) on a system receiving a lot of ham. Good to know. I think 4 out of 100,000 FP is really good. 58% overlap is more confirmation that a spam is a spam. But that means 42% is new spam not caught by other iXhash. So - not phenomenal - but not bad. Thanks for the feedback. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Trying to understand how bayes works.
I've had bayes disabled in SA because it seems to not be able to stay working in a high volume situation. The MySQL server can't seem to keep up with it even on very fast computers. But - thinking about trying something interesting - doing my own bayes in a different way. Here's my question. Bayes breaks the message down into some sort of tokens and then does statistics on those tokens as to tokens found in spam vs. tokens found in ham. But what about combinations of tokens? I'm thinking that I'd like to have something that says when it sees tokens X and Y and Z then that's spam even though X,Y,Z might be in ham when not combined. Does bayes do that or is there anything that does? -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Trying to understand how bayes works.
On 12/10/15 18:31, Benny Pedersen wrote: Marc Perkel skrev den 2015-12-10 22:54: I've had bayes disabled in SA because it seems to not be able to stay working in a high volume situation. The MySQL server can't seem to keep up with it even on very fast computers. i got a palm Zire that can do ocr on handwrited text :=) pretty good for the kind of cpu it have But - thinking about trying something interesting - doing my own bayes in a different way. i have tryed bogofilter with very good succes, and i see problems with bayes here aswell, i remember you changed to mariadb ?` at that time you sayed it worked better then mysql ? did it fail again ? Here's my question. Bayes breaks the message down into some sort of tokens and then does statistics on those tokens as to tokens found in spam vs. tokens found in ham. But what about combinations of tokens? I'm thinking that I'd like to have something that says when it sees tokens X and Y and Z then that's spam even though X,Y,Z might be in ham when not combined. Does bayes do that or is there anything that does? if z is scored as spam, and x and y is ham, then its ham basicly that how bayes works, but a single mail might be lots of digest to compare for this to say spam or not test bogofilter put 100 spam mails in a spam folder put 100 non spam mails in a ham folder train bogofilter with this 2 folders in one go, not first ham and then spam, it must be done in one bogofilter call train, configure bogofilter.cf plugin for spamassassin, test it :=) YMMV Yes MariaDB was better than MySQL but not good enough to keep up. I even tried putting the database on ram disk and still didn't work. I'm thinking about incorporating Bogofilter but instead of feeding it messages I'm thinking about feeding it the spamassassin results - the rule names it hit + other data about the message and then let it score the rules. That's what I want to experiment with. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400
Re: Try my IXHASH
On 12/09/15 05:50, Rick Macdougall wrote: Hi, The messages it flags are messages that would have been caught without it. About 2% of messages it flags are not seen by any other markers. Regards, Rick Any false positives? I suppose catching the same messages again just creates more confidence in the spam flagging. -- Marc Perkel - Sales/Support supp...@junkemailfilter.com http://www.junkemailfilter.com Junk Email Filter dot com 415-992-3400