Re: Harvested Fresh .cn URIBL
Hi! How does that simplify the problem? The difficulty is in getting data about when a domain was created. I thought the problem was in getting data for recently created domains, not all domains. If it's a problem with all domains, this won't help at all. If you rely on whois data, and you do this for .nl you will get fooled! The NL whois only shows the date that the domain was -first- registered. If its expired and a year later someone registers that same domain, so it would be new for people, the whois shows the date the first registration was done. If you want reliable data, especially for .cn, .hk and so on you need to get a relation with those registry's. And so far (welcome to communism) not many people succeeded there. Bye, Raymond.
Re: Harvested Fresh .cn URIBL
Hi! The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. It would be nice if Google could help out here...if anyone besides the registrar knows when a domain first shows up on the net, it's them. They have no reason to help others filter spam. They have a competitive advantage. They would prefer everyone to use gmail or pay for their commercial spam filtering service for non-gmail. Warren, its a pretty silly statement. You are aware that google helps out a lot of blacklists with data? Appearantly not. Bye, Raymond.
Re: SpamAssassin Ruleset Generation
On Tue, 2009-10-06 at 13:50 -0700, an anonymous Nabble user wrote: > Other than the sought rules, all the rules are manually generated? Actually, as has been said, I believe all stock rules are manually written. There are some third-party rule-sets out there that are auto generated -- not limited to Sought. > Is there any statistics on how frequently are new rules/regex adopted by > spamassasssin? Who are the people who write them? Any details related to it? Somehow this begs the question -- why? Why are you asking? Why and what are you ultimately interested in? And of course, did you even consider to dig through the SVN repo, some docs on the wiki and to ask google? Most of this should be pretty easy to find out if you're willing to read some. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Spam Eating Monkey?
-- Original Message -- From: Warren Togami Date: Sun, 04 Oct 2009 19:42:06 -0400 >http://spameatingmonkey.com > >Anyone have any experience using these DNSBL and URIBL's? I plugged these into my main.cf just just before "permit", and therefore before content-filter, to see what increment SEM caught that all my other envelope filtering missed. I haven't put SEM into SA, yet. reject_rbl_client bl.spameatingmonkey.net, reject_rhsbl_sender fresh15.spameatingmonkey.net, reject_rhsbl_client fresh15.spameatingmonkey.net, reject_rhsbl_sender uribl.spameatingmonkey.net, reject_rhsbl_client uribl.spameatingmonkey.net, reject_rhsbl_sender urired.spameatingmonkey.net, reject_rhsbl_client urired.spameatingmonkey.net, the total SEM reject today, 00:00 to 14:30 : egrep -ic "reject:.*spameatingmonkey" maillog 3956 breakdown: egrep -ic "reject:.*bl.spameatingmonkey" maillog 358 egrep -ic "reject:.*fresh*\.spameatingmonkey" maillog 96 egrep -ic "reject:.*urired*\.spameatingmonkey" maillog 3214 egrep -ic "reject:.*uribl*\.spameatingmonkey" maillog 172 That's a very significant chunk out of the rejects that would have been done by content-scanning. here's the helo's for the urired rejects. In nearly all cases, the from and ptr domains were the same as helo domain egrep -i "reject:.*urired*\.spameatingmonkey" maillog | awk '{print $NF}' | sort -f | uniq -ic | sort -rnf | less 160 helo= 104 helo= 49 helo= 42 helo= 27 helo= 21 helo= 20 helo= 20 helo= 19 helo= 19 helo= 18 helo=<123fiesta.net> 17 helo= 17 helo= 16 helo= 16 helo= 15 helo= 14 helo= 14 helo= 14 helo= 14 helo= 13 helo= 12 helo= 12 helo= 12 helo= 12 helo= 12 helo= 11 helo= 11 helo= 11 helo= 11 helo= 11 helo= 11 helo= 10 helo= 10 helo= 10 helo= 10 helo= 10 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 9 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 8 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 7 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 6 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo= 5 helo=<52t.problemsroederer.com> 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo= 4 helo=<24hb.whencorrections.com> 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo= 3 helo
Re: Harvested Fresh .cn URIBL
On 10/7/2009 8:01 PM, Rob McEwen wrote: Blaine Fleming wrote: I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. Which brings up an interesting idea. I wonder how many legit non-spam ..cn domains exist? Surely it is a fraction of a percent of the # of .cn domains used for spam purposes, correct? nope.. there are zillions and growing. same thought could apply to .com or .org or .net for chinese mom & pop users. and its still pretty off-topic in the SA users list and better placed in spam-l.com
Re: Harvested Fresh .cn URIBL
On 10/07/2009 03:29 PM, Jason Bertoch wrote: John Hardin wrote: The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. It would be nice if Google could help out here...if anyone besides the registrar knows when a domain first shows up on the net, it's them. They have no reason to help others filter spam. They have a competitive advantage. They would prefer everyone to use gmail or pay for their commercial spam filtering service for non-gmail. Warren
Re: Harvested Fresh .cn URIBL
John Hardin wrote: The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. It would be nice if Google could help out here...if anyone besides the registrar knows when a domain first shows up on the net, it's them.
Re: Harvested Fresh .cn URIBL
On Wed, 7 Oct 2009, Terry Carmen wrote: John Hardin wrote: On Wed, 7 Oct 2009, Terry Carmen wrote: > Instead of blacklisting new domains (which is apparently difficult to > do), why not blacklist all .cn domains (or simply all domains) newer > than xxx days? > > If they're older than xxx days and not yet on another blacklist for > sending actual spam, return a neutral response. How does that simplify the problem? The difficulty is in getting data about when a domain was created. I thought the problem was in getting data for recently created domains, not all domains. If it's a problem with all domains, this won't help at all. The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- We should endeavour to teach our children to be gun-proof rather than trying to design guns to be child-proof --- 6 days since a sunspot last seen - EPA blames CO2 emissions
Re: Harvested Fresh .cn URIBL
Terry Carmen wrote: > Instead of blacklisting new domains (which is apparently difficult to > do), why not blacklist all .cn domains (or simply all domains) newer > than xxx days? > > If they're older than xxx days and not yet on another blacklist for > sending actual spam, return a neutral response. How do you determine age? Whois queries really can't be used because of the reasons I mentioned in my previous post. I guess one option is to record the date that the domain was first seen anywhere and then work from that but what about the domains that are rarely used? One of the big problems I see with trying to look at .cn domains in the wild is the lack of data. How many people here deal with a large volume of mail that would have legitimate .cn domains? I seem to remember recently seeing one of the big blacklists not having enough non-english mail to work with. --Blaine
Re: Harvested Fresh .cn URIBL
John Hardin wrote: On Wed, 7 Oct 2009, Terry Carmen wrote: Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. How does that simplify the problem? The difficulty is in getting data about when a domain was created. I thought the problem was in getting data for recently created domains, not all domains. If it's a problem with all domains, this won't help at all. Terry
Re: Harvested Fresh .cn URIBL
On Wed, 2009-10-07 at 14:01 -0400, Rob McEwen wrote: > I might be interested in maintaining such a freely available > list--accessible via rsync at no charge--if someone else would come up > with the SA rule or plugin. I'd be interested in giving this a shot, as a proof-of-concept. Any details and further discussion off-list. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Harvested Fresh .cn URIBL
> > Spam from .cn domains can be mitigated with the right rules and querying > > multiple lists. I know my users never see .cn domains in their inbox > > and if I didn't run a blacklist I wouldn't either. Indeed, spam with .cn URIs really doesn't appear to be a problem at all. They are well covered by the existing URI DNSBLs -- which are doing an awesome job, btw -- and the rest of the SA rules. There is no value in additional rules that catch anyway high scoring spam. It's the low scorers that need our attention. > Instead of blacklisting new domains (which is apparently difficult to > do), why not blacklist all .cn domains (or simply all domains) newer > than xxx days? It has been pointed out before, but it still is kind of funny, how this thread re-invents existing techniques and re-iterates the very same, often discussed problems. And keeps doing so. I haven't seen anything new so far. The design of a BL is pretty off-topic here anyway, even more so on the users list. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Harvested Fresh .cn URIBL
On Wed, 7 Oct 2009, Terry Carmen wrote: Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. How does that simplify the problem? The difficulty is in getting data about when a domain was created. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Of the twenty-two civilizations that have appeared in history, nineteen of them collapsed when they reached the moral state the United States is in now. -- Arnold Toynbee --- 6 days since a sunspot last seen - EPA blames CO2 emissions
Re: Harvested Fresh .cn URIBL
Blaine Fleming wrote: Warren Togami wrote: Opinions of this proposal? I would love to have a listing of recently registered .cn domains but until the TLD operator starts working with us that just isn't going to happen. Trying to perform a whois lookup on every domain is painfully slow. Once you get a high enough volume of .cn domains detected it will become impossible and that is assuming you are never rate limited. On top of that, most of the time when I do a whois lookup on a .cn domain I find the destination whois server to be unresponsive, stuck in a "maintenance mode" or doesn't include any data except the domain name and the listed nameservers. Spam from .cn domains can be mitigated with the right rules and querying multiple lists. I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. Terry
Re: Harvested Fresh .cn URIBL
Blaine Fleming wrote: > I know my users never see .cn domains in their inbox > and if I didn't run a blacklist I wouldn't either. Which brings up an interesting idea. I wonder how many legit non-spam .cn domains exist? Surely it is a fraction of a percent of the # of .cn domains used for spam purposes, correct? If that assurtion is right, then maybe it makes more sense to build a really good and comprehensive ".cn" whitelist. Then create a rule in SA whereby ".cn" domains not on that whitelist would add a point or two to the score. (it shouldn't be used to outright block due to the "guilty until proven innocent stance!"...and it shouldn't be a default SA rule) I might be interested in maintaining such a freely available list--accessible via rsync at no charge--if someone else would come up with the SA rule or plugin. My unique contribution could be providing a means for my own "invaluement URI ratings engine" to rate potential candidates for whitelisting--this would separate most of the wheat from the chaff with little effort--just as long as the entries submitted was kept to a reasonably low volume. -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: Harvested Fresh .cn URIBL
Warren Togami wrote: > Opinions of this proposal? I would love to have a listing of recently registered .cn domains but until the TLD operator starts working with us that just isn't going to happen. Trying to perform a whois lookup on every domain is painfully slow. Once you get a high enough volume of .cn domains detected it will become impossible and that is assuming you are never rate limited. On top of that, most of the time when I do a whois lookup on a .cn domain I find the destination whois server to be unresponsive, stuck in a "maintenance mode" or doesn't include any data except the domain name and the listed nameservers. Spam from .cn domains can be mitigated with the right rules and querying multiple lists. I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. --Blaine
Re: Harvested Fresh .cn URIBL
On 10/07/2009 11:27 AM, Raymond Dijkxhoorn wrote: We are working on getting .CN zone access. Thats the only way to speed things up. The only challenging part is to get a copy of the CN zone just like we get copy's of other ccTLD/gTLD's. OK, I was under the impression that it was impossible to obtain zone access. If this happens then great! Warren
Re: Harvested Fresh .cn URIBL
Hi Warren! It seems then the only way to feed a URIBL fresh .cn domains would be a spam trap. This proposed URIBL would be extremely easy to build on the infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. My own volume of spam is too small to do this. you haven't really looked into Spamhaus data & SA rules, have you? have you looked into SURBL/URIBL's data & datafeeds? what's the deal here? do you not represent RedHat? have no acccess to RH spam data? just can't imagine RH can't provide itself or the community with plenty of spam data. Opinions of this proposal? How many whois lookup you think you can do a day till you get blocked? If you can get a copy of the CN root zone existing blacklists can do the work for you. If not, give the current blacklists like SURBL and URIBL a bit more credit. We are doing that exact same thing. You explain or reinvent the DOB list. Your method is loose and dangerous. While other blacklists have allready mechanisms in place to avoid false positives. We are working on getting .CN zone access. Thats the only way to speed things up. The only challenging part is to get a copy of the CN zone just like we get copy's of other ccTLD/gTLD's. What you describe isnt new and isnt exiting. For me its daily things. Its routine. Check SURBL and URIBL. Try to understand how those lists work. The last 48 hour we added 1211 .CN domains into SURBL. Check for example how URIBL GOLD works. Remeber that a large part of the filtering tips and tricks is not to talk about what you are doing, just do it. Telling in detail how you do things will only give spammers and advantage and doesnt bring the community any good. You seem to have just entered the SA work, and like to help, thats good but dont end up making a whole lot of noise. If RH is sserious about this, attend conferences like MAAWG and talk with people there, talk with the blacklist guys, many are on those events. Dont flood people on the user list please. Most likely there are better lists to start talks like this. Bye, Raymond.
Re: Harvested Fresh .cn URIBL
On 10/7/2009 5:00 PM, Warren Togami wrote: > It seems then the only way to feed a URIBL fresh .cn domains would be a spam trap. This proposed URIBL would be extremely easy to build on the infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. My own volume of spam is too small to do this. you haven't really looked into Spamhaus data & SA rules, have you? have you looked into SURBL/URIBL's data & datafeeds? what's the deal here? do you not represent RedHat? have no acccess to RH spam data? just can't imagine RH can't provide itself or the community with plenty of spam data. Opinions of this proposal? sorry - imo, reinventing the wheeel some time too late
Harvested Fresh .cn URIBL
http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail A very sizeable amount of spam (currently 50%) contains .cn domains that were registered very recently. They keep registering new domains in order to keep ahead of the URIBL's. http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_EIGHT/detail Last month, I noticed that a very sizeable percentage of the .cn spam were fresh and random \w{8}.cn domain names. http://ruleqa.spamassassin.org/20091007-r822624-n/T_CN_SEVEN/detail I don't know if it was due to our discussion here, but for whatever reason I began seeing new spam with \w{7}.cn domains registered since October 3rd, and \w{8}.cn seems to be tapering off now. http://spameatingmonkey.com/lists.html#SEM-FRESH \w{8}.cn or any length is unsafe to be used as a real rule. The only safe way to detect these fresh .cn domains would be a URIBL. But URIBL's like SEM-FRESH described here are only capable of knowing new domains of TLD's who provide zone files that can be compared. It seems then the only way to feed a URIBL fresh .cn domains would be a spam trap. This proposed URIBL would be extremely easy to build on the infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. My own volume of spam is too small to do this. A targeted URIBL verified by whois for registration dates would be near 100% accurate and deserving of a high score. This would hopefully break the economic feasibility of .cn URI spam by rendering fresh domains quickly useless. This could be a new URIBL, or an existing URIBL. If this is an existing URIBL, spamassassin can use meta rules to boolean match .cn domains and assign a higher score. Example: meta FRESHCN_7 SOME_URIBL && CN_URL score FRESHCN_7 0 4.0 0 4.0 Spam Trap Workflow == 1. Spam trap receives spam containing .cn URI. 2. Lookup locally, is this .cn domain already known? 3. If already known, stop. 4. Lookup A record of this domain. If NXDOMAIN stop. 5. Record domain in database with UNKNOWN registration date. URIBL Generation Workflow = 1. If domain has UNKNOWN registration date, attempt whois lookup. Record registration date if found. 2. Ignore all UNKNOWN records. 3. Dump all domains registered in the last 7 days into one zone. score FRESHCN_7 0 4.0 0 4.0 4. Dump all domains registered in the last 14 days into another zone. score FRESHCN_14 0 2.0 0 2.0 5. Stop listing anything older than 14 days. By then the regular URIBL's have listed these domains. 6. Do not delete older .cn domains. Keeping them in the database prevents redundant whois lookups later. The only challenging part here is whois lookup rate limiting. whois lookups are critical to populating this URIBL, but it is a resource that can only be used in small quantities. The above workflow attempts to minimize the number of whois lookups. Given that only spammers would send mail to a trap, the number of .cn domain names might be small enough to handle whois lookups. The goal here is to break the economic model. I'm told that .cn domains cost $3-10/each to register, and whois lookups are certainly cheaper to automate. I can't find a published whois rate limit for CNNIC. In any case, it wouldn't be difficult for us to proxy whois lookups to bypass rate limits should that become necessary. Opinions of this proposal? Is anyone from PSBL, HOSTKARMA, or SEM interested? Warren Togami wtog...@redhat.com
Re: SIGCHLD query
On Wed, 2009-10-07 at 14:31 +0200, Per Jessen wrote: > Okay, I ran a check on my logs since midnight - yes, I also see a lot of > child processes running for less than 10secs, in fact slightly more > than 50%. Interesting issue. > Here's the results of a scan across all my mail logs: Processing file /var/log/maillog* 3544 Messages found 3538 Results (99.8%) 6 SIGCHLDs caught (0.2%) minavgmax Message size:353 7340 496682 Scan time (secs):0.52.3 34.5 I've checked all the SIGCHLD log lines. The previuous scan by those children were all in the range 1.- to 3.1 seconds. I'm using the default child population and the default --timeout-child of 300 secs. Martin
Re: consolidating DNSBLs into a single query (was Spam Eating Monkey?)
Mike Cardwell wrote: > I don't understand the logic of that. Ie, why you'd need to use > bitmasking? zen.spamhaus.org is a combination of various different > lists and returns multiple values like this: If every list is an "outright block" list, then you are correct. My point applies to situations where some lists are used in scoring mode, and where there is a desire to be able to calculate a score based on exactly which lists hit on a particular sending IP. But even if someone tries this with all "outright block lists", and uses rbldnsd's built in ability to consolidate lists, then there are still two problems: (a) for auditing purposes, there'd be no way to tell *which* lists hit on that IP since many use the same return codes (b) some hundreds-of-MB-large lists which previously could have used the lower-memory "ip4tset" would have to revert back to slower and higher-memory-usage "ip4set", fwiw Again, not saying these problems can't be solved, only pointing them out so that anyone who cares to try can know what they need to do, or need to expect. -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: SIGCHLD query
Per Jessen wrote: > Martin Gregorie wrote: >>> Yeah - maybe there is some indication in the log? I think there is >>> a switch that determines how many emails a child will process before >>> needing restart. (just looked it up: --max-conn-per-child) >>> I just checked my logs, during the last 9 hours I have 6016 of >>> these: >>> >>> spamd[11362]: spamd: handled cleanup of child pid 14010 due to >>> SIGCHLD >>> >>> Is that the one you mean? >>> >> That's the only log message I've seen. Sometimes you can associate it >> with a scan that exceeded --timeout-child seconds and sometimes, much >> more rarely, it happens after a scan taking two or three seconds. > > I don't know if that is happening on my systems too, I haven't > checked. Okay, I ran a check on my logs since midnight - yes, I also see a lot of child processes running for less than 10secs, in fact slightly more than 50%. Interesting issue. /Per Jessen, Zürich
Re: SIGCHLD query
> Yeah - maybe there is some indication in the log? I think there is a > switch that determines how many emails a child will process before > needing restart. (just looked it up: --max-conn-per-child) > I just checked my logs, during the last 9 hours I have 6016 of these: > > spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD > > Is that the one you mean? > That's the only log message I've seen. Sometimes you can associate it with a scan that exceeded --timeout-child seconds and sometimes, much more rarely, it happens after a scan taking two or three seconds. Tuning would be easier if there was some indication about why a scan had terminated - maybe it could be added to the statistics list in the 'results' log line. > There are also arguments for controlling minimum/maximum number of spare > child processes - if your load varies, and you have a significant > difference between min and max, I could see that leading to more child > processes stopping and starting. > Does the parent or the child determine whether the child stays alive after completing a scan or whether it should terminate? Martin
Re: SIGCHLD query
Martin Gregorie wrote: Yeah - maybe there is some indication in the log? I think there is a switch that determines how many emails a child will process before needing restart. (just looked it up: --max-conn-per-child) I just checked my logs, during the last 9 hours I have 6016 of these: spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD Is that the one you mean? That's the only log message I've seen. Sometimes you can associate it with a scan that exceeded --timeout-child seconds and sometimes, much more rarely, it happens after a scan taking two or three seconds. I don't know if that is happening on my systems too, I haven't checked. I wonder if the latter could be caused by the maintenance of spare child processes? There are also arguments for controlling minimum/maximum number of spare child processes - if your load varies, and you have a significant difference between min and max, I could see that leading to more child processes stopping and starting. Does the parent or the child determine whether the child stays alive after completing a scan or whether it should terminate? It's the child that determines that "Uh, I've done X scans, all done". It's just a for-loop: for( i=0; i
Re: consolidating DNSBLs into a single query (was Spam Eating Monkey?)
On 07/10/2009 05:19, Rob McEwen wrote: Also, this loses the ability to *score* on multiple lists... unless you use a bitmasked scoring system whereby one list gets assigned ".2", another ".4", another ".8", on to ".128". But that leaves a maximum of only 7 lists. Sure, you can add more than 7 by employing other octets in the "answer IP", but that only severely complicates matters. And as it stands, you'd also have the complexity of getting the spam filter to parse, understand, and react properly to those bitmasks. I don't understand the logic of that. Ie, why you'd need to use bitmasking? zen.spamhaus.org is a combination of various different lists and returns multiple values like this: m...@haven:~$ host -t a 2.0.0.127.zen.spamhaus.org 2.0.0.127.zen.spamhaus.org A 127.0.0.4 2.0.0.127.zen.spamhaus.org A 127.0.0.10 2.0.0.127.zen.spamhaus.org A 127.0.0.2 m...@haven:~$ It's perfectly easy for SpamAssassin to see that three different values have been returned, so 127.0.0.2 is on three separate lists and that an extra score should be applied for each of those three. It's also quite easy to do it in Exim, eg if I wanted to block an email in Exim if the sending ip is on both sbl.spamhaus.org and xbl.spamhaus.org I could either do two dns lookups like this: deny dnslists = sbl.spamhaus.org dnslists = xbl.spamhaus.org Or I could do it with a single dns lookup like this: deny dnslists = zen.spamhaus.org=127.0.0.2 dnslists = zen.spamhaus.org=127.0.0.4 You can be 100% backwards compatible by leaving all of your lists as they are, but then adding another one which is a combined version of all of them... -- Mike Cardwell - IT Consultant and LAMP developer Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/
Re: SIGCHLD query
Martin Gregorie wrote: On Tue, 2009-10-06 at 23:16 +0200, Per Jessen wrote: Martin, generally speaking, the parent can only report the signal and that the child has gone away. The child would have to report on why. OK, rephrase that to "a pity the child doesn't say why its generating a SIGCHLD signal". Yeah - maybe there is some indication in the log? I think there is a switch that determines how many emails a child will process before needing restart. (just looked it up: --max-conn-per-child) I just checked my logs, during the last 9 hours I have 6016 of these: spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD Is that the one you mean? There are also arguments for controlling minimum/maximum number of spare child processes - if your load varies, and you have a significant difference between min and max, I could see that leading to more child processes stopping and starting. /Per