Re: Non-English languages
Kenneth Porter wrote: >the classes dragged so incredibly slowly that I learned just a little >vocabulary and the most basic of grammar, and still led the class. I >usually finished my physics homework in that class while waiting for >everyone to catch up. > >As a programmer I envy my professional peers who can speak Japanese and >other non-European languages. My interest in programming languages extends >to natural languages, and I find their differences fascinating. > >To those of you who've successfully learned 2nd and 3rd languages as an >adult, what do you recommend for accomplishing that? > > > Same here. I took a couple years of high school Spanish in California and Comic books. Or "bande dessinee" as it's called in French. The story lines are often simple, and the pictures give a lot of context to what is being talked about. -Philip
Re: Non-English languages (was: xxxl spam)
On Apr 13, 2006, at 9:46 PM, Kenneth Porter wrote: On Thursday, April 13, 2006 10:32 PM -0600 "Paul R. Ganci" <[EMAIL PROTECTED]> wrote: Unfortunately I am still a linguistic idiot and only speak English ... a Buffalo, NY version at that! My grand parents came over from Italy in 1920 and promptly stopped speaking Italian around my parents. It forced my parents to learn English at the cost of never learning Italian. There is plently of room to accomodate two languages but neither the US education system or home life is set up to do it. Same here. I took a couple years of high school Spanish in California and the classes dragged so incredibly slowly that I learned just a little vocabulary and the most basic of grammar, and still led the class. I usually finished my physics homework in that class while waiting for everyone to catch up. As a programmer I envy my professional peers who can speak Japanese and other non-European languages. My interest in programming languages extends to natural languages, and I find their differences fascinating. To those of you who've successfully learned 2nd and 3rd languages as an adult, what do you recommend for accomplishing that? I wish I had stuck with German in HS. And I wish I had taken the time to learn Latin and/or Greek back when I had all of that free time on my hands in HS. These days, it seems like everyone* ought to know (in addition to English) Spanish, and then a choice of French, Chinese, or Japanese. (* in the US, I don't mean globally; globally, I'd probably say that we should all know 3 out of those 5, but that's just me making wild-a*s-suggestions for a world that doesn't care about my opinion ;-) ) And, reiterating Kenneth's question: Anyone have advice for an almost middle-aged person who wants to go about expanding his natural language capabilities? (Hmm.. that's probably a dumb question for me.. I think all of those are taught at the university where I work... and can take free classes; could add Italian, Latin, and Greek too...; still for everyone who doesn't work for a University, but who has a similar thought, it's a good question to ponder)
Re: Getting spamassassin not to bother checking outgoing mail
Rob Tanner wrote: Hi, I installed spamassassin on my server a week ago and along with a number of Postfix settings, I'm nearly 100% spam free (I might get one spam a day now). But one thing I haven't figured out. I would like not to check mail originating in my address space. Is that a spamassassin setting or something I need to do in postfix. Postfix. Being a filter, SpamAssassin will scan anything passed to it. Daryl
Getting spamassassin not to bother checking outgoing mail
Hi, I installed spamassassin on my server a week ago and along with a number of Postfix settings, I'm nearly 100% spam free (I might get one spam a day now). But one thing I haven't figured out. I would like not to check mail originating in my address space. Is that a spamassassin setting or something I need to do in postfix. Thanks, Rob -- Rob TannerDRACO DORMIENS NUNQUAM [EMAIL PROTECTED]TITILLANDUS
Non-English languages (was: xxxl spam)
On Thursday, April 13, 2006 10:32 PM -0600 "Paul R. Ganci" <[EMAIL PROTECTED]> wrote: Unfortunately I am still a linguistic idiot and only speak English ... a Buffalo, NY version at that! My grand parents came over from Italy in 1920 and promptly stopped speaking Italian around my parents. It forced my parents to learn English at the cost of never learning Italian. There is plently of room to accomodate two languages but neither the US education system or home life is set up to do it. Same here. I took a couple years of high school Spanish in California and the classes dragged so incredibly slowly that I learned just a little vocabulary and the most basic of grammar, and still led the class. I usually finished my physics homework in that class while waiting for everyone to catch up. As a programmer I envy my professional peers who can speak Japanese and other non-European languages. My interest in programming languages extends to natural languages, and I find their differences fascinating. To those of you who've successfully learned 2nd and 3rd languages as an adult, what do you recommend for accomplishing that?
Re: xxxl spam
Loren Wilton wrote: I predict that the US will be the first country in the 21th century to abandon English as the national language, while almost all other countries seem to be mandating that their citizens learn English. Loren The problem with the US is that we are linguistic idiots (a quote from Columbia University German Professor). If you go to Europe in general they speak at least two languages fluently. English and the country's native language. I have had the opportunity to work in both Geneva, Switzerland and and Milan, Italy. All business is conducted in English and everything else in Italian or in the case of Switzerland either German, Swiss German or French. Essentially all the engineers with whom I worked could speak two languages or in some cases four. I don't know what the big deal is. It shouldn't be "one" language but at least two here in the US. Start young when it is easy for kids to pick up the sounds. Unfortunately I am still a linguistic idiot and only speak English ... a Buffalo, NY version at that! My grand parents came over from Italy in 1920 and promptly stopped speaking Italian around my parents. It forced my parents to learn English at the cost of never learning Italian. There is plently of room to accomodate two languages but neither the US education system or home life is set up to do it. -- Paul ([EMAIL PROTECTED])
Re: xxxl spam
> states like California where it could matter (reducing costs in govt > overhead by eliminating multiple languages and the requirement for > multilingual workers), the "English as state language" supporters are > afraid of what almost happened in Florida. Considering that at last census a "minority" of 54% of California residents spoke Spanish as their primary or only language... I predict that the US will be the first country in the 21th century to abandon English as the national language, while almost all other countries seem to be mandating that their citizens learn English. Loren
Re: Haven't seen this one before... "Premature padding of base64 data"
Philip Prindeville wrote: > Apr 13 16:57:06 mail mimedefang-multiplexor[11341]: Slave 8 stderr: > Premature padding of base64 data at > > > Any ideas? Didn't see any mention of it in previous postings... > Looks like someone screwed up their base-64 encoding. Base64 encodes into "quartets", where 3 8-bit bytes get encoded as 4 ascii characters containing 6 bits of data each, so they can fit into ascii-text ranges. At the end of the input, Base64 is normally padded out to make a quartet with = characters if the input ends in a non-even multiple of 3 bytes (thus not making a complete quartet) Because it's a 3->4 encoding, even one byte of input generates two bytes of code output, the first holding 6 of the 8 input bits, and the next holding the remaining 2. In this case, the last two characters of the quartet get filled with = as a pad. If you were to think of base-64 as a series of the input is 3 8-bit bytes, like so: 12345678 12345678 12345678 That input gets re-split into 4 pieces of 6-bits each, like this: 123456 781234 567812 345678 But with a short input: 12345678 encodes as something like: 123456 78 '=' '=' The error message you see means that an = was inserted in the first or second position of the last quartet of encoded data. That can never happen, unless the data is invalid or corrupted. Either some bytes were dropped, resulting in a base64 encoding that's not a multiple of 4 bytes, causing a pad to get shifted up. Or more than 2 pads exist at the end.
Haven't seen this one before... "Premature padding of base64 data"
This appeared in my logs. Running 3.1.1 on Linux FC3 (x86_64) with Sendmail 8.13.1 and Mimedefang 2.56: Apr 13 16:57:05 mail sendmail[23371]: NOQUEUE: connect from lists-outbound.sourceforge.net [66.35.250.225] Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: Milter (mimdefang): init success to negotiate Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: Milter: connect to filters Apr 13 16:57:05 mail mimedefang.pl[22325]: helo: lists-outbound.sourceforge.net (66.35.250.225) said "helo lists-outbound.sourceforge.net" Apr 13 16:57:05 mail sendmail[23371]: k3DMv5s4023371: from=<[EMAIL PROTECTED]>, size=15309, class=-60, nrcpts=1, msgid=<[EMAIL PROTECTED]>, proto=ESMTP, daemon=MTA-v4, relay=lists-outbound.sourceforge.net [66.35.250.225] Apr 13 16:57:06 mail mimedefang-multiplexor[11341]: Slave 8 stderr: Premature padding of base64 data at /usr/lib/perl5/vendor_perl/5.8.5/MIME/Decoder/Base64.pm line 109. Apr 13 16:57:07 mail mimedefang.pl[22325]: k3DMv5s4023371: hits=18.463, req=5, names=DATE_IN_PAST_96_XX,FORGED_MSGID_MSN,HTML_IMAGE_ONLY_12,HTML_MESSAGE,HTML_SHORT_LINK_IMG_1,L_ALSA_DEVEL,MIME_HTML_ONLY,MSGID_SHORT,SPF_PASS,URIBL_SBL,URIBL_WS_SURBL Apr 13 16:57:07 mail mimedefang.pl[22325]: MDLOG,k3DMv5s4023371,spam,18.463,66.35.250.225,<[EMAIL PROTECTED]>,<[EMAIL PROTECTED]>,[Alsa-devel] Your mortagee approval Apr 13 16:57:07 mail mimedefang.pl[22325]: filter: k3DMv5s4023371: bounce=1 discard=1 Apr 13 16:57:07 mail mimedefang[11357]: k3DMv5s4023371: Bouncing because filter instructed us to Apr 13 16:57:07 mail sendmail[23371]: k3DMv5s4023371: Milter: data, reject=554 5.7.1 Message rejected; scored too high on the Spam test. Any ideas? Didn't see any mention of it in previous postings... Interesting msg-id. Hmmm. Already a rule for that. Good... -Philip
RE: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '
Fixed the problem. Backed up the bayes tables with sa-learn --backup, and save the userpref and awl tables with mysqldump. Then deleted out the entire database, set everything to utf8 in my.cnf, recreated the database and tables using utf8 as the default character set. Then restored from backup with sa-learn --restore and created the awl and userpref tables with the mysqldump files (after editing them to use utf8 as the default character set). Just in cases anyone else has this problem in the future...
Re: Proper use of user_prefs "whitelist"
Daryl C. W. O'Shea wrote: > > Your whitelist entries don't match > "[EMAIL PROTECTED]". > > > This should work (note the *@): > whitelist_from_rcvd [EMAIL PROTECTED] hermes.apache.org > > > This would work, but would be trivially forged: > whitelist_from [EMAIL PROTECTED] > If you use the SPF plugin, another, very simple, way would be: whitelist_from_spf [EMAIL PROTECTED] Works great here. I'd also suggest: bayes_ignore_to users@spamassassin.apache.org bayes_ignore_to spamassassin-users@incubator.apache.org bayes_ignore_from [EMAIL PROTECTED] To inhibit any bayes autolearning of list posts.
Re: Proper use of user_prefs "whitelist"
Forrest Aldrich wrote: I've been having some difficulty with the user_prefs and the whitelist_* fucntions. I read the examples etc, and I believe these are correct, but clearly certain email is still being tagged (see below). I wonder if someone can help clarify what I'm doing wrong here. First, here are the directives in my ~/.spamassassin/user_prefs file, as it applies to this instance: whitelist_from_rcvd spamassassin.apache.org hermes.apache.org whitelist_from *.apache.org Here is the Sendmail log, showing the rejection: Apr 13 11:52:24 mail sm-mta[34951]: k3DFqNBR034951: from=<[EMAIL PROTECTED]>, Your whitelist entries don't match "[EMAIL PROTECTED]". This should work (note the *@): whitelist_from_rcvd [EMAIL PROTECTED] hermes.apache.org This would work, but would be trivially forged: whitelist_from [EMAIL PROTECTED] Daryl
Re: New bayes poison
On Thu, Apr 13, 2006 at 11:45:07PM +0200, Michael Monnerie wrote: > > 0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs > > some mails 0.0 DK_POLICY_TESTING Domain Keys: policy says domain > > is testing DK 0.0 DK_SIGNED Domain Keys: message has a > > signature -0.0 DK_VERIFIED Domain Keys: signature passes > > Where to get these rules? They're standard in 3.1 if you have enabled the Mail::SpamAssassin::Plugin::DomainKeys plugin. -- Randomly Generated Tagline: "Note that I am a proponent of Zen in the Art of Systems Administration, and thus believe that it's appropriate to present yourself as a beginner in all things. This helps you keep a fresh perspective and spank the unsuspecting at snooker." - Benjy Feen pgppCURvZeVng.pgp Description: PGP signature
Re: SpamAssassin BZ downtime
Justin Mason wrote: http://ajax.apache.org/%7ejefft/ : Bugzilla is moving to a new host, and is temporarily down while the database synchs. Apologies for the inconvenience. --j. Yay, it doesn't seem excruciatingly slow anymore.
Re: New bayes poison
On Donnerstag, 13. April 2006 19:05 Justin Mason wrote: > 0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs > some mails 0.0 DK_POLICY_TESTING Domain Keys: policy says domain > is testing DK 0.0 DK_SIGNED Domain Keys: message has a > signature -0.0 DK_VERIFIED Domain Keys: signature passes > verification Where to get these rules? mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpzTGsQwGKdS.pgp Description: PGP signature
Re: Question regarding meta's
Ruben Cardenal wrote: > Hi, > > Let's say I have: > > header __ID1 /regexp1/ > header __ID2 /regexp2/ > header __ID3 /regexp3/ > meta MYID ((__ID1 + __ID2 + __ID3) > 1) > score MYID 1 > > When a message triggers MYID, is there any way in the X-Spam-Report of > showing which individual parts of the meta the message matched? No, but you can do something like this: header ID1 /regexp1/ score ID1 0.0001 header ID2 /regexp2/ score ID2 0.0001 header ID3 /regexp3/ score ID3 0.0001 meta MYID ((ID1 + ID2 + ID3) > 1) score MYID 1 This will force ID1-3 to be evaluated as normal rules and show up in the hit list, but will give them an insignificant score. (You can't make the score 0, that will disable them)
Re: xxxl spam
On Apr 13, 2006, at 11:40 AM, mouss wrote: Matt Kettler wrote: And even us US folks do have encoding issues. After all, English is not our official language here in the US, what do you mean here? what would be your official language? The US doesn't have an official language. By default, it is assumed to be English for most things, but it's not "Official". And, in some regions within the US, official govt signs and documents come in various languages (the reasons why this is true has to do with liability and legality; since there's no official language, you can't just pick _one_ language to publish your forms in, and be done with it; if you do, you're neglecting significant minority populations (and in some regions, those can be quite significant, such as spanish speakers in southern Florida or southern California), which then makes you vulnerable to law suits saying that you're discriminating and/or being negligent toward those significant minorities who aren't required to speak English, because English isn't an official language). In order to simplify this, some states have tried to enact official language legislation. Florida tried it. Someone put "Make English the official state language" on a ballot. The Cuban-American population in southern Florida got mad, and put "Make Spanish the official state language" on the ballot. Neither one passed, but the Spanish one got more votes. This pretty much silenced the "English as state language" movement in Florida, as their plan almost backfired on them. I don't remember any other state trying it since. The states where there wouldn't be any opposition don't need to make it a law ... and in states like California where it could matter (reducing costs in govt overhead by eliminating multiple languages and the requirement for multilingual workers), the "English as state language" supporters are afraid of what almost happened in Florida. So ... sorry for the long winded explanation, but that's what he was saying.
Re: xxxl spam
mouss wrote: >> However, it is true that the vast majority of the corpus currently >> comes from >> folks who speak English (King's or Yankee) as a primary language, and >> that's a >> bit of a problem as it creates considerable bias in the rules. >> >> And even us US folks do have encoding issues. After all, English is >> not our >> official language here in the US, > > what do you mean here? what would be your official language? The United States of America does not have any official language. Americanized English is our common language, but it's not official. This means that our government has to supply forms and materials in many languages for its citizens, because it cannot require that citizens speak English. For example, we have tax forms in French: http://www.irs.gov/pub/irs-access/f2290fr_accessible.pdf Admittedly non-english forms and services are somewhat secondary here, but they are present. > > and I've got plenty of users that speak >> multiple languages, not all of which use plain-ascii. >> > > I guess so. now I'm not sure our situation isn't worst because people > tried to find non standard solutions that are still used. I still > remember the days when some customers were asking us to "fix" our > software because "it broke their accents"... hopefully these times are > gone, but I still see "broken" mail (much more than I should). actually, > I also see mail that doesn't get rendered correctly on thunderbird. so > I'll admit that the issue isn't really about accented chars... > Well, yours is certainly worse, or at least more prevalent, than the problem here in the US, but I would not say it's the worst. Generally speaking the worst case seems to be present in smaller Asian nations, which have really extensive use of non-us characters. At least the French can restrict their text to the same character set as English and still be readable, although awkward due to the screwed up accents. Also, smaller Asian nations still to this day have a high prevalence of locally-grown mail clients, many of which are not even remotely RFC compliant, but work well with others in the same locale. They're also much more likely to make use of mixed-language text containing many character sets. Speaking 2 or 3 different languages is fairly common in the smaller countries of the Asian region, just due to necessity for trade with neighboring countries. Another area with this same basic issue would be the middle-east, but the number of completely different character sets is smaller.
Re: Question regarding meta's
On Thu, Apr 13, 2006 at 08:40:30PM +0200, Ruben Cardenal wrote: > header __ID1 /regexp1/ > header __ID2 /regexp2/ > header __ID3 /regexp3/ > meta MYID ((__ID1 + __ID2 + __ID3) > 1) > > When a message triggers MYID, is there any way in the X-Spam-Report of > showing which individual parts of the meta the message matched? As far as I know, you can't do that without a plugin. You could write a small plugin such that _SUBTESTS_ or something would be rewritten to the list of subtests (starts with "__") that hit, and then include that in the report. -- Randomly Generated Tagline: "It's a question of consistency. With a Republican president, I think you should just expect a certain amount of corruption -- And with a Democratic president, you should expect a [ bleep ] in the oval office." - Dave Foley on Politically Incorrect, 2001.12.07 pgpQsFQAHB14A.pgp Description: PGP signature
Re:
Daniel Madaoui wrote: > So I restart the spamd daemon whith this options > > /usr/local/bin/spamd -d -m10 -u spamassassin ( spamassassin in an user > with its directory /home/spamassassin/.spamassassin ) > > He try to use the .spamassassin directory who belong to root > (/root/.spamssassin/ ) Known bug, fixed in SA 3.1.0 and higher. http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3900 Also be aware that unless your source has back ported fixes, SA 3.0.3 is vulnerable to a two different DoS attacks triggered by sending it a specially crafted messages. 3.0.4, possibly older versions: "many to: headers" DoS vulnerability http://secunia.com/advisories/17386/ http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-3351 3.0.1-3.0.3: malformed message with long headers DoS http://secunia.com/advisories/15704/ http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2005-1266
Question regarding meta's
Hi, Let's say I have: header __ID1 /regexp1/ header __ID2 /regexp2/ header __ID3 /regexp3/ meta MYID ((__ID1 + __ID2 + __ID3) > 1) score MYID 1 When a message triggers MYID, is there any way in the X-Spam-Report of showing which individual parts of the meta the message matched? Ruben
Re: xxxl spam
Matt Kettler wrote: mouss wrote: I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. Sounds like we need more non-us based corpus contributors. After all, the SA devs can only work with what they get. Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the US. Last I checked he was in Ireland. Unfortunately this doesn't help with the encoding issue, as they still use ordinary English characters over there for most things. (I don't think Gaelic is very common in email.) So bear in mind that SA isn't just "developed in the US by US citizens for US markets". oh, I never meant that. However, it is true that the vast majority of the corpus currently comes from folks who speak English (King's or Yankee) as a primary language, and that's a bit of a problem as it creates considerable bias in the rules. And even us US folks do have encoding issues. After all, English is not our official language here in the US, what do you mean here? what would be your official language? and I've got plenty of users that speak multiple languages, not all of which use plain-ascii. I guess so. now I'm not sure our situation isn't worst because people tried to find non standard solutions that are still used. I still remember the days when some customers were asking us to "fix" our software because "it broke their accents"... hopefully these times are gone, but I still see "broken" mail (much more than I should). actually, I also see mail that doesn't get rendered correctly on thunderbird. so I'll admit that the issue isn't really about accented chars...
spamd using a bayes and auto-whitelist commun to anybody
It's better with a subject :( I want to use SA for a lot of users which don't have home directory. There mails are in /var/mail. The spammed mails are send to the recipient in his file /var/mail/user with the addition of SA. The bayes and auto-whitelist database will be commun to anybody. I use spamassassin 3.0.3 under freebsd 4.8 I use postfix and SA through procmail. postfix main.cf: mailbox_command = /usr/local/bin/procmail -t I 've got the config file for procmail in /usr/local/etc/procmailrc PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:. LOGFILE=/var/log/procmail.log :0fw: $LOGNAME.lock * < 256000 | /usr/local/bin/spamc I launch spamd in this way: /usr/local/bin/spamd -d -m10 and when I send a mail I 've got this log: Apr 13 19:39:37 host spamd[48968]: spamd: setuid to root succeeded Apr 13 19:39:37 host spamd[48968]: spamd: still running as root: user not specified with -u, not found, or set to root, falling back to nobody at /usr/local/bin/spamd line 1152, line 4. Apr 13 19:39:37 host spamd[48968]: spamd: processing message <[EMAIL PROTECTED]> for root:65534 Apr 13 19:39:37 host spamd[48968]: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 48968 for /root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:39:37 host spamd[48968]: auto-whitelist: open of auto- whitelist file failed: locker: safe_lock: cannot create tmp lockfile / root/.spamassassin/auto-whitelist.lock.example.com.48968 for / root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:39:37 host spamd[48968]: bayes: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/bayes.lock.example.com.48968 for /root/.spamassassin/bayes.lock: Permission denied Apr 13 19:39:37 host spamd[48968]: spamd: clean message (-1.4/5.0) for root:65534 in 0.3 seconds, 744 bytes. Apr 13 19:39:37 host spamd[48968]: spamd: result: . -1 - ALL_TRUSTED scantime=0.3,size=744,user=root,uid=65534,required_score=5.0,rhost=local host.example.com,raddr=127.0.0.1,rport=1645,mid=<3822750E-3444-4F34-938F [EMAIL PROTECTED]>,autolearn=failed The mail was in the mailbox but the bayes was not used. So I restart the spamd daemon whith this options /usr/local/bin/spamd -d -m10 -u spamassassin ( spamassassin in an user with its directory /home/spamassassin/.spamassassin ) He try to use the .spamassassin directory who belong to root (/ root/.spamssassin/ ) Apr 13 19:50:53 host spamd[49552]: spamd: connection from localhost.example.com [127.0.0.1] at port 1982 Apr 13 19:50:53 host spamd[49552]: spamd: processing message <[EMAIL PROTECTED]> for root:3005 Apr 13 19:50:53 host spamd[49552]: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 49552 for /root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:50:53 host spamd[49552]: auto-whitelist: open of auto- whitelist file failed: locker: safe_lock: cannot create tmp lockfile / root/.spamassassin/auto-whitelist.lock.example.com.49552 for / root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:50:53 host spamd[49552]: bayes: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/bayes.lock.example.com.49552 for /root/.spamassassin/bayes.lock: Permission denied Apr 13 19:50:53 host spamd[49552]: spamd: clean message (-1.4/5.0) for root:3005 in 0.1 seconds, 736 bytes. Apr 13 19:50:53 host spamd[49552]: spamd: result: . -1 - ALL_TRUSTED scantime=0.1,size=736,user=root,uid=3005,required_score=5.0,rhost=localh ost.example.com,raddr=127.0.0.1,rport=1982,mid=[EMAIL PROTECTED]>,autolearn=failed how can I configure spamd to use another directory for using bayes and auto-whitelist database ( in /home/spamassassin/.spamassassin ). It works if I change the permissions of /root/.spamassassin but it's not optimal. Thanks for your help.
[no subject]
I want to use SA for a lot of users which don't have home directory. There mails are in /var/mail. The spammed mails are send to the recipient in his file /var/mail/user with the addition of SA. The bayes and auto-whitelist database will be comun to anybody. I use spamassassin 3.0.3 under freebsd 4.8 I use postfix and SA through procmail. postfix main.cf: mailbox_command = /usr/local/bin/procmail -t I 've got the config file for procmail in /usr/local/etc/procmailrc PATH=$HOME/bin:/usr/bin:/usr/ucb:/bin:/usr/local/bin:. LOGFILE=/var/log/procmail.log :0fw: $LOGNAME.lock * < 256000 | /usr/local/bin/spamc I launch spamd in this way: /usr/local/bin/spamd -d -m10 and when I send a mail I 've got this log: Apr 13 19:39:37 host spamd[48968]: spamd: setuid to root succeeded Apr 13 19:39:37 host spamd[48968]: spamd: still running as root: user not specified with -u, not found, or set to root, falling back to nobody at /usr/local/bin/spamd line 1152, line 4. Apr 13 19:39:37 host spamd[48968]: spamd: processing message <[EMAIL PROTECTED]> for root:65534 Apr 13 19:39:37 host spamd[48968]: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 48968 for /root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:39:37 host spamd[48968]: auto-whitelist: open of auto- whitelist file failed: locker: safe_lock: cannot create tmp lockfile / root/.spamassassin/auto-whitelist.lock.example.com.48968 for / root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:39:37 host spamd[48968]: bayes: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/bayes.lock.example.com.48968 for /root/.spamassassin/bayes.lock: Permission denied Apr 13 19:39:37 host spamd[48968]: spamd: clean message (-1.4/5.0) for root:65534 in 0.3 seconds, 744 bytes. Apr 13 19:39:37 host spamd[48968]: spamd: result: . -1 - ALL_TRUSTED scantime=0.3,size=744,user=root,uid=65534,required_score=5.0,rhost=local host.example.com,raddr=127.0.0.1,rport=1645,mid=<3822750E-3444-4F34-938F [EMAIL PROTECTED]>,autolearn=failed The mail was in the mailbox but the bayes was not used. So I restart the spamd daemon whith this options /usr/local/bin/spamd -d -m10 -u spamassassin ( spamassassin in an user with its directory /home/spamassassin/.spamassassin ) He try to use the .spamassassin directory who belong to root (/ root/.spamssassin/ ) Apr 13 19:50:53 host spamd[49552]: spamd: connection from localhost.example.com [127.0.0.1] at port 1982 Apr 13 19:50:53 host spamd[49552]: spamd: processing message <[EMAIL PROTECTED]> for root:3005 Apr 13 19:50:53 host spamd[49552]: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/auto-whitelist.lock.example.com. 49552 for /root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:50:53 host spamd[49552]: auto-whitelist: open of auto- whitelist file failed: locker: safe_lock: cannot create tmp lockfile / root/.spamassassin/auto-whitelist.lock.example.com.49552 for / root/.spamassassin/auto-whitelist.lock: Permission denied Apr 13 19:50:53 host spamd[49552]: bayes: locker: safe_lock: cannot create tmp lockfile /root/.spamassassin/bayes.lock.example.com.49552 for /root/.spamassassin/bayes.lock: Permission denied Apr 13 19:50:53 host spamd[49552]: spamd: clean message (-1.4/5.0) for root:3005 in 0.1 seconds, 736 bytes. Apr 13 19:50:53 host spamd[49552]: spamd: result: . -1 - ALL_TRUSTED scantime=0.1,size=736,user=root,uid=3005,required_score=5.0,rhost=localh ost.example.com,raddr=127.0.0.1,rport=1982,mid=[EMAIL PROTECTED]>,autolearn=failed how can I configure spamd to use another directory for using bayes and auto-whitelist database ( in /home/spamassassin/.spamassassin ). It works if I change the permissions of /root/.spamassassin but it's not optimal. Thanks for your help.
Re: xxxl spam
mouss wrote: > I also understand that US guys may get less encoded subjects, but at least in > .fr, we have that all the time (because of our accented letters, and because > many companies still use software that predates mime). and if I find a > legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. Sounds like we need more non-us based corpus contributors. After all, the SA devs can only work with what they get. Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the US. Last I checked he was in Ireland. Unfortunately this doesn't help with the encoding issue, as they still use ordinary English characters over there for most things. (I don't think Gaelic is very common in email.) So bear in mind that SA isn't just "developed in the US by US citizens for US markets". However, it is true that the vast majority of the corpus currently comes from folks who speak English (King's or Yankee) as a primary language, and that's a bit of a problem as it creates considerable bias in the rules. And even us US folks do have encoding issues. After all, English is not our official language here in the US, and I've got plenty of users that speak multiple languages, not all of which use plain-ascii.
Re: xxxl spam
John Rudd wrote: I wouldn't do that. Please note that I "said it the short" way. I of course don't jump to disable rules. I do check whether the message should have been flagged as spam (a "reasonable" FP). if so, that's life. If possible, I see if I can create a rule to make it get hammed without breaking the whole filter. If however, the tests that made it classify as spam are not clear to me, then I check if I can lower some. but some tests just get disabled. Just because legitimate mail triggers some rule doesn't mean that the rule is flawed. Using your example, triggering "no_real_name" does not mean that the message is spam, it means that the message has _some_ similarity to at least some spam messages (the higher the score, the stronger the similarity). And, that's absolutely true: statistically, when looking at the corpus which was used to create the rules database, a higher percentage of "no_real_name" messages were spam. As I already said in another thread, the statistics results depend on the attributes you are checking. the perceptron will not wake up and say "hey, come on, this attribute is not good". so, if you run a mass check with rules like: - IP parity - first letter of sender - mailer: "the bat" for instance - relay = comcast, free.fr, ... ... then the perceptron will give you what you asked for: scores. I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. Now, if legit messages were not just triggering those rules, but also triggering enough rules to be flagged as spam ... then I would lower the value of those rules, but not disable those rules. I disable the rules, and if I get false negatives, I see what I can do. up so far, (the very few) missed spam would have been missed anyway. But I would only do that if I could see that there was a large percentage of should-be-ham messages being flagged as spam by that rule AND that rule wasn't being useful in flagging spam messages. The reason is: if the message is being flagged, but it shouldn't have been, then perhaps my "corpus" of messages differs significantly enough from the SA internal corpus that my score values need to be different. But that doesn't mean that the rules are so disjoint from tracking spam that they should be entirely disabled. They just don't have the same weighting that my corpus needs. If, instead, most messages passing through my mail servers, that triggered that rule, really did seem to be spam, then I wouldn't alter the score at all. I would just pass the should-have-been-ham message into my bayesian learner and hope that a low bayes score for messages like that would offset the rules had flagged it as spam. everybody has its own situation. I am very FP sensitive. I prefer to get spam than to lose an important mail. after all, I do review my spam. so the less FPs there are, the faster I can review my junk folder.
dbg: bayes: tok_get_all: SQL error: Illegal mix of collations for operation ' IN '
Mysql: SHOW VARIABLES LIKE "character%" Variable_name Value character_set_clientutf8 character_set_connectionutf8 character_set_database latin1 character_set_results utf8 character_set_serverutf8 character_set_systemutf8 character_sets_dir /usr/share/mysql/charsets/ SHOW VARIABLES LIKE "collation%" Variable_name Value collation_connectionutf8_general_ci collation_database latin1_swedish_ci collation_serverutf8_general_ci SHOW CREATE TABLE bayes_token Table Create Table bayes_token CREATE TABLE `bayes_token` (\n `id` int(11) NOT NULL default '0',\n `token` char(5) NOT NULL default '',\n `spam_count` int(11) NOT NULL default '0',\n `ham_count` int(11) NOT NULL default '0',\n `atime` int(11) NOT NULL default '0',\n PRIMARY KEY (`id`,`token`)\n) ENGINE=MyISAM DEFAULT CHARSET=latin1 Can't get Bayes to work. Here is my lint output: [23913] dbg: logger: adding facilities: all [23913] dbg: logger: logging level is DBG [23913] dbg: generic: SpamAssassin version 3.1.1 [23913] dbg: config: score set 0 chosen. [23913] dbg: util: running in taint mode? no [23913] dbg: dns: is Net::DNS::Resolver available? yes [23913] dbg: dns: Net::DNS version: 0.53 [23913] dbg: diag: perl platform: 5.008007 linux [23913] dbg: diag: module installed: MIME::Base64, version 3.05 [23913] dbg: diag: module installed: HTML::Parser, version 3.48 [23913] dbg: diag: module installed: Digest::SHA1, version 2.11 [23913] dbg: diag: module installed: DB_File, version 1.814 [23913] dbg: diag: module installed: Net::DNS, version 0.53 [23913] dbg: diag: module installed: Net::SMTP, version 2.29 [23913] dbg: diag: module installed: Mail::SPF::Query, version 1.998 [23913] dbg: diag: module installed: IP::Country::Fast, version 309.002 [23913] dbg: diag: module installed: Razor2::Client::Agent, version 2.80 [23913] dbg: diag: module installed: Net::Ident, version 1.20 [23913] dbg: diag: module installed: IO::Socket::INET6, version 2.51 [23913] dbg: diag: module installed: IO::Socket::SSL, version 0.97 [23913] dbg: diag: module installed: Time::HiRes, version 1.82 [23913] dbg: diag: module installed: DBI, version 1.50 [23913] dbg: diag: module installed: Getopt::Long, version 2.34 [23913] dbg: diag: module installed: LWP::UserAgent, version 2.033 [23913] dbg: diag: module installed: HTTP::Date, version 1.46 [23913] dbg: diag: module installed: Archive::Tar, version 1.28 [23913] dbg: diag: module installed: IO::Zlib, version 1.04 [23913] dbg: ignore: using a test message to lint rules [23913] dbg: config: using "/etc/mail/spamassassin" for site rules pre files [23913] dbg: config: read file /etc/mail/spamassassin/init.pre [23913] dbg: config: read file /etc/mail/spamassassin/v310.pre [23913] dbg: config: using "/var/lib/spamassassin/3.001001" for sys rules pre files [23913] dbg: config: using "/var/lib/spamassassin/3.001001" for default rules dir [23913] dbg: config: read file /var/lib/spamassassin/3.001001/updates_spamassassin_org.cf [23913] dbg: config: using "/etc/mail/spamassassin" for site rules dir [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_adult.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_bayes_poison_nxm.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum0.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_evilnum1.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj0.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj1.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj2.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj3.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_genlsubj_eng.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header0.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header1.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header2.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header3.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_header_eng.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_highrisk.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html0.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html1.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html2.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html3.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html4.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_html_eng.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu0.cf [23913] dbg: config: read file /etc/mail/spamassassin/70_sare_obfu1.cf [23
Re: TEXTAREA style="visibility: hidden"
[EMAIL PROTECTED] wrote: s/Scripting/CSS :hover/ is perfectly reasonable, though: http://www.meyerweb.com/eric/css/edge/menus/demo.html (doesn't work in IE 6, but works fine in Firefox, Safari, IE 7b2pr...) D'oh! I blame the coffee. There wasn't enough of it when I wrote my last post. On the other hand, to apply :hover rules, you need an actual stylesheet and a way to select the element(s) you're showing. You could still apply the visibility/display rules inline, but you might as well just put them in the stylesheet. That said, I'm probably guilty of using inline styles for this sort of thing myself -- just not in email. -- Kelson Vibber SpeedGate Communications
Re: relaydb and tarpit
Michael Monnerie wrote: On Donnerstag, 13. April 2006 18:15 mouss wrote: pfff. just reading the two first paragraphs is enough to look elsewhere. some people seem to redefine what a false positive is. I didn't mean that, I meant the tarpitting approach. Of course you have to set some (much) harder policy on which systems to put on your tarpit-blackhole list. But *if* you have such a "tarpit decider without FP" (not sure how to do that...), couldn't this be a very good countermeasure to spam? The issue is that: - to tarpit, you need to devote some process or thread to that. and this is not unix specific. however you do, you'll need something to handle it. even with a packet filter, this still means many unnecessary states. - the best you can do (at user level) is have an asynchronous process (which can handle many connections) to do so. now, either it is the listener, but then it needs to pass "good" connections to "good" listeners (which ones support this?) or the opposite (which ones support this?). of course, you can tune this to the point that you'd write a spam-OS. just to discover that spamers found othre ways to get to you. - the most severe problem is to find a criteria to decide who is bad. This is what we're all trying to do! If I knew which clients are used by spamers, I would need no tarpit nor DNSBL nor SA nor bayes. I would just block these. - sometimes, some ideas seem fine. but they don't resist serious analysis. you want to protect yourself, but that's just part of your goal. you want to do so at a limited cost and under some (non explicit but real) conditions (killing all the non-white people will statistically reduce terrorism, but would you do that?). I have already seen systems that get idle when I connect to them. These systems just make me use my resources in vain, which is not a good practice. And I tend to believe these systems are driven by nuts, so are easily attacked (I never do that, for both personal and professional reasons. The best way to deal with them is to ignore them. route add, transport_maps, ... are enough to build one's own internet:)
Re: How was this missed?
!Sure, the pattern doesn't match. "." means there has to be some (any) !character between the numbers. "984" has no characters between the !numbers. DOH!!! Thanks. your right...
Re: xxxl spam
On Apr 13, 2006, at 9:56 AM, mouss wrote: I am also seing many legit mail trigering some SA rules (*_exess, no_real_name, x_library, ...). when I see this, I check the rule, and if I can't find a justification, I disable it. I wouldn't do that. Just because legitimate mail triggers some rule doesn't mean that the rule is flawed. Using your example, triggering "no_real_name" does not mean that the message is spam, it means that the message has _some_ similarity to at least some spam messages (the higher the score, the stronger the similarity). And, that's absolutely true: statistically, when looking at the corpus which was used to create the rules database, a higher percentage of "no_real_name" messages were spam. Now, if legit messages were not just triggering those rules, but also triggering enough rules to be flagged as spam ... then I would lower the value of those rules, but not disable those rules. But I would only do that if I could see that there was a large percentage of should-be-ham messages being flagged as spam by that rule AND that rule wasn't being useful in flagging spam messages. The reason is: if the message is being flagged, but it shouldn't have been, then perhaps my "corpus" of messages differs significantly enough from the SA internal corpus that my score values need to be different. But that doesn't mean that the rules are so disjoint from tracking spam that they should be entirely disabled. They just don't have the same weighting that my corpus needs. If, instead, most messages passing through my mail servers, that triggered that rule, really did seem to be spam, then I wouldn't alter the score at all. I would just pass the should-have-been-ham message into my bayesian learner and hope that a low bayes score for messages like that would offset the rules had flagged it as spam.
Re: How was this missed?
On Thu, Apr 13, 2006 at 09:55:59AM -0700, [EMAIL PROTECTED] wrote: > >> 2*0*6*984-2327 > > > /2.?0.?6.?9.?8.?4.?2.?3.?2.?7|2.?0.?6.?3.?3.?3.?0.?0.?5.?1|2.?0.?6.?9.?8 > .?4.?0.?1.?0.?6|3.?3.?8.?3.?5.?7.?9|2.?0.?6.?3.?3.?8.?6.?0.?6.?1|2.?0.?6 > .?2.?0.?2.?2.?0.?3.?3/ > > Or, perhaps, better: > > /2\D?0\D?6\D?9\D?8\D?4\D?2\D?3\D?2\D?7|2\D?0\D?6\D?3\D?3\D?3\D?0\D?0\D?5 > \D?1|2\D?0\D?6\D?9\D?8\D?4\D?0\D?1\D?0\D?6|3\D?3\D?8\D?3\D?5\D?7\D?9|2\D > ?0\D?6\D?3\D?3\D?8\D?6\D?0\D?6\D?1|2\D?0\D?6\D?2\D?0\D?2\D?2\D?0\D?3\D?3 > / Now you won't catch (206) 984-2327 [206] 984-2327 206 - 984 - 2327 etc. FYI. -- Randomly Generated Tagline: "Thinking of using NT for your critical apps? Isn't there enough suffering in the world?" - Sun Microsystems Ad pgpWQfHPF8Wbh.pgp Description: PGP signature
Re: TEXTAREA style="visibility: hidden"
On Thu, Apr 13, 2006 at 09:45:13AM -0700, Kelson wrote: > Nope. No legit uses in email that I can think of. Just because you can't think of a use doesn't mean people don't use them. I see a lot of: pgpJo5l3EnQsH.pgp Description: PGP signature
RE: New bayes poison
[EMAIL PROTECTED] wrote: > The spammer used the Yahoo! webmail infrastructure (probably via an > automated HTTP client) to send his spam. I've been reporting spam with good DK signatures to the mail provider: http://add.yahoo.com/fast/help/us/mail/cgi_spam https://services.google.com/inquiry/gmail_security2 DK and SPF are very useful in proving accountability for email sent. -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
Re: How was this missed?
On Thursday 13 April 2006 11:55, [EMAIL PROTECTED] wrote: > Theo Van Dinter wrote: > > On Thu, Apr 13, 2006 at 10:39:29AM -0600, wrote: > >> Any idea how this one got through? > >> > >> body BRIAN_PHONE_NUMBERS > >> > /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7 > .9|2.0.6.3.3.8.6.0.6.1|2.0.6 > >> .2.0.2.2.0.3.3/ There's a ruleset I use from: http://www.emtinc.net/includes/chickenpox.cf .. that checks for the d.i.f.f.e.r.e.n.t kinds of spacing like this... a lot of the spam that has those kinds of characteristics will have several of the CHICKENPOX_ rules that have fired positive. It checks for some 60+ different patterns.. describe J_CHICKENPOX_12 1alpha-pock-2alpha describe J_CHICKENPOX_13 1alpha-pock-3alpha describe J_CHICKENPOX_14 1alpha-pock-4alpha describe J_CHICKENPOX_15 1alpha-pock-5alpha describe J_CHICKENPOX_16 1alpha-pock-6alpha describe J_CHICKENPOX_17 1alpha-pock-7alpha describe J_CHICKENPOX_18 1alpha-pock-8alpha describe J_CHICKENPOX_19 1alpha-pock-9alpha describe J_CHICKENPOX_110 1alpha-pock-10alpha describe J_CHICKENPOX_111 1alpha-pock-11alpha describe J_CHICKENPOX_21 2alpha-pock-1alpha describe J_CHICKENPOX_22 2alpha-pock-2alpha describe J_CHICKENPOX_23 2alpha-pock-3alpha describe J_CHICKENPOX_24 2alpha-pock-4alpha describe J_CHICKENPOX_25 2alpha-pock-5alpha describe J_CHICKENPOX_26 2alpha-pock-6alpha describe J_CHICKENPOX_27 2alpha-pock-7alpha describe J_CHICKENPOX_28 2alpha-pock-8alpha describe J_CHICKENPOX_29 2alpha-pock-9alpha describe J_CHICKENPOX_210 2alpha-pock-10alpha describe J_CHICKENPOX_31 3alpha-pock-1alpha describe J_CHICKENPOX_32 3alpha-pock-2alpha describe J_CHICKENPOX_33 3alpha-pock-3alpha describe J_CHICKENPOX_34 3alpha-pock-4alpha describe J_CHICKENPOX_35 3alpha-pock-5alpha describe J_CHICKENPOX_36 3alpha-pock-6alpha describe J_CHICKENPOX_37 3alpha-pock-7alpha describe J_CHICKENPOX_38 3alpha-pock-8alpha describe J_CHICKENPOX_39 3alpha-pock-9alpha describe J_CHICKENPOX_41 4alpha-pock-1alpha describe J_CHICKENPOX_42 4alpha-pock-2alpha describe J_CHICKENPOX_43 4alpha-pock-3alpha describe J_CHICKENPOX_44 4alpha-pock-4alpha describe J_CHICKENPOX_45 4alpha-pock-5alpha describe J_CHICKENPOX_46 4alpha-pock-6alpha describe J_CHICKENPOX_47 4alpha-pock-7alpha describe J_CHICKENPOX_48 4alpha-pock-8alpha describe J_CHICKENPOX_51 5alpha-pock-1alpha describe J_CHICKENPOX_52 5alpha-pock-2alpha describe J_CHICKENPOX_53 5alpha-pock-3alpha describe J_CHICKENPOX_54 5alpha-pock-4alpha describe J_CHICKENPOX_55 5alpha-pock-5alpha describe J_CHICKENPOX_56 5alpha-pock-6alpha describe J_CHICKENPOX_57 5alpha-pock-7alpha describe J_CHICKENPOX_61 6alpha-pock-1alpha describe J_CHICKENPOX_62 6alpha-pock-2alpha describe J_CHICKENPOX_63 6alpha-pock-3alpha describe J_CHICKENPOX_64 6alpha-pock-4alpha describe J_CHICKENPOX_65 6alpha-pock-5alpha describe J_CHICKENPOX_66 6alpha-pock-6alpha describe J_CHICKENPOX_71 7alpha-pock-1alpha describe J_CHICKENPOX_72 7alpha-pock-2alpha describe J_CHICKENPOX_73 7alpha-pock-3alpha describe J_CHICKENPOX_74 7alpha-pock-4alpha describe J_CHICKENPOX_75 7alpha-pock-5alpha describe J_CHICKENPOX_81 8alpha-pock-1alpha describe J_CHICKENPOX_82 8alpha-pock-2alpha describe J_CHICKENPOX_83 8alpha-pock-3alpha describe J_CHICKENPOX_84 8alpha-pock-4alpha describe J_CHICKENPOX_91 9alpha-pock-1alpha describe J_CHICKENPOX_92 9alpha-pock-2alpha describe J_CHICKENPOX_93 9alpha-pock-3alpha describe J_CHICKENPOX_101 10alpha-pock-1alpha describe J_CHICKENPOX_102 10alpha-pock-2alpha -- Tyler Nally [EMAIL PROTECTED] 317-989-2028
Re: New bayes poison
Good afternoon, Michael, On Thu, 13 Apr 2006, Michael Monnerie wrote: Hi, I just received some new bayes poison attempt. I never had one so large, maybe that could start to be a bit of problem? To the best of my knowledge, it isn't. Temporarily you get more hapaxes (tokens seen just once) in your bayes data, but those will get expired sooner or later. There's no effect on accuracy if the tokens truly are seen once. If they show up again in spam, it actually helps because the phrases help identify the second spam. Cheers, - Bill --- "Computers let you make more mistakes faster than any other invention in human history, with the possible exception of handguns and tequila." -- Mitch Radcliffe (Courtesy of Hugo van der Kooij <[EMAIL PROTECTED]>) -- William Stearns ([EMAIL PROTECTED]). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --
Re: xxxl spam
John Rudd wrote: While I don't disagree with your assessment of XP systems, I have a different hunch about why such a large percentage of the mail coming from XP systems is spam, and a smaller percentage of mail coming from the other systems is spam: a) In general, XP systems are not servers, and therefore, are not mail servers. b) Due to (a), if you do your mail/spam/virus scanning on machines that do not receive direct connections from your own clients (mail/spam/virus scanning at the border), OR if you do not have a high percentage of XP clients in your domain, then your scanning systems will not receive many (if any) legitimate direct connections from XP clients ... because a legitimate mail sending process on an XP system will be directly connecting to their own domain's mail server, and not to YOUR mail scanning systems. c) Thus, if you meed the conditions in (b), and if we accept (a) as true, then the vast majority of connections you receive from XP systems, on your mail scanning systems, will be from spam/virus bots trying to directly submit spam or virus laden messages to your mail gateways instead of submitting it to their own mail servers (as bots are known to do). We would expect to see a lower percentage of spam from server type OSes (or OSes that can be clients or servers) because a higher percentage of those platforms are used as legitimate mail servers. The other factor here is: while I _hate_ linux, how much of the spam being submitted by linux boxes is merely a mail server relaying on behalf of one of their infected clients? (same with the unix systems, and the 2000/2003 systems) And thus not at all indicative of the quality of linux systems administration out on the internet. I think this is one of those cases where "the statistics work as blind observations of behavior, but attempting to describe _why_ the statistics works is not something you can sum up with a simple an straight forward explanation". Kinda like QM. I agree that statistics aren't the whole story. you can study the percentage of thiefs/criminals based on skin color and origin (some people already do it, and many jump to conclusions without studies). but you can do the same study based on social situation and past history of people. the first "researcher" will probably conclude that black/arabic/latin/... people are "more" criminal. the second "researcher" will instead conclude that criminality is more seen in poor communities, but that these aren't the worst criminals (killing vs stealing for instance). back to xp and co. my feeling (no, I didn't run a study and won't) is that even if any study would show that we get more spam from XP than from linux, I will not use this to classify my mail. I am certain that if you do stats on mail date, you'll find that some dates correspond to more spam than others. we've already seen people jumping to block specific mailers (the bat for instance) based on their stats. I am also seing many legit mail trigering some SA rules (*_exess, no_real_name, x_library, ...). when I see this, I check the rule, and if I can't find a justification, I disable it.
RE: How was this missed?
Theo Van Dinter wrote: > On Thu, Apr 13, 2006 at 10:39:29AM -0600, wrote: >> Any idea how this one got through? >> >> body BRIAN_PHONE_NUMBERS >> /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7 .9|2.0.6.3.3.8.6.0.6.1|2.0.6 >> .2.0.2.2.0.3.3/ >> >> A Gen_uine Coll`ege Deg.ree in 2 weeks Cal_l us now_!-> >> 2*0*6*984-2327 > > Sure, the pattern doesn't match. "." means there has to be some (any) > character between the numbers. "984" has no characters between the > numbers. Fixed version: /2.?0.?6.?9.?8.?4.?2.?3.?2.?7|2.?0.?6.?3.?3.?3.?0.?0.?5.?1|2.?0.?6.?9.?8 .?4.?0.?1.?0.?6|3.?3.?8.?3.?5.?7.?9|2.?0.?6.?3.?3.?8.?6.?0.?6.?1|2.?0.?6 .?2.?0.?2.?2.?0.?3.?3/ Or, perhaps, better: /2\D?0\D?6\D?9\D?8\D?4\D?2\D?3\D?2\D?7|2\D?0\D?6\D?3\D?3\D?3\D?0\D?0\D?5 \D?1|2\D?0\D?6\D?9\D?8\D?4\D?0\D?1\D?0\D?6|3\D?3\D?8\D?3\D?5\D?7\D?9|2\D ?0\D?6\D?3\D?3\D?8\D?6\D?0\D?6\D?1|2\D?0\D?6\D?2\D?0\D?2\D?2\D?0\D?3\D?3 / -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
Re: How was this missed?
Please start a new thread instead of replying to an unrelated message. Thursday 13 April 2006 18:39 wrote: > Any idea how this one got through? > > body BRIAN_PHONE_NUMBERS > /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9| >2.0.6.3.3.8.6.0.6.1|2.0.6 .2.0.2.2.0.3.3/ > describe BRIAN_PHONE_NUMBERS Phone number or address pulled from spam > scoreBRIAN_PHONE_NUMBERS 5.5 > A period (.) matches exactly one arbitrary character (except newline). Try putting a question mark (?) after each period. > - Message - > > Good day, > > > A Gen_uine Coll`ege Deg.ree in 2 weeks Cal_l us now_!-> 2*0*6*984-2327 > > Within 2 weeks! No Study Required! 1_0_0_% Veri.fiable! > > Right now the following deg.rees are being offered: > > B/A, .B/S/C,.M/A,.M/S/C,.M/B/A, .P/H/D, > > > C.al_l us now_ for more information, 2*0*6*984-2327 -- Magnus Holmgren pgpxO9V9dnv05.pgp Description: PGP signature
Re: How was this missed?
On Thu, Apr 13, 2006 at 10:39:29AM -0600, wrote: > Any idea how this one got through? > > body BRIAN_PHONE_NUMBERS > /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|2.0.6.3.3.8.6.0.6.1|2.0.6 > .2.0.2.2.0.3.3/ > > A Gen_uine Coll`ege Deg.ree in 2 weeks Cal_l us now_!-> 2*0*6*984-2327 Sure, the pattern doesn't match. "." means there has to be some (any) character between the numbers. "984" has no characters between the numbers. -- Randomly Generated Tagline: 1-900-Tech Support...hold...all operators are busy. pgpwSxDV5mFul.pgp Description: PGP signature
RE: TEXTAREA style="visibility: hidden"
Kelson wrote: > (3) Scripting that will show and hide sections in response to time or > user interaction. ... > #3 shouldn't even be a consideration, since HTML-capable email clients > should have scripting disabled for safety reasons. s/Scripting/CSS :hover/ is perfectly reasonable, though: http://www.meyerweb.com/eric/css/edge/menus/demo.html (doesn't work in IE 6, but works fine in Firefox, Safari, IE 7b2pr...) -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
Re: TEXTAREA style="visibility: hidden"
Matthias Keller wrote: In my opinion you shouldn't limit it to textareas as I've seen them on DIVs and others too... So to me, any visibility:hidden or display:none is suspect as I dont see any legitimate use in emails Hmm... The main uses I can think of for display:none and visibility:hidden are: (1) Serving the same content to different media (for instance, set a page so that the navigation area doesn't appear when you print it) (2) Replacing content (as in CSS techniques to replace text with graphical headlines) (3) Scripting that will show and hide sections in response to time or user interaction. (4) Creating machine-readable content that the user will not see. (keyword stuffing, bayes poison, black-hat SEO, honeypot seeding, etc.) #1 isn't a good fit with email, since the main things you'd want to leave out of a print version are more likely to be in the mail client UI than part of the message body. Though it might be useful for providing a handheld-friendly view. Even so, it wouldn't work with inline styles, only with an attached or embedded stylesheet. #2 is pretty much useless in email. If you want a text alternative, you're better off providing a text/plain version of the message. #3 shouldn't even be a consideration, since HTML-capable email clients should have scripting disabled for safety reasons. #4 is mostly deceptive. If you need to provide metadata in an HTML doc, well, that's what META tags are for. If you need to provide metadata in an email message, you've got headers, you can add an XML attachment, etc. Nope. No legit uses in email that I can think of. -- Kelson Vibber SpeedGate Communications
How was this missed?
Guys, Any idea how this one got through? body BRIAN_PHONE_NUMBERS /2.0.6.9.8.4.2.3.2.7|2.0.6.3.3.3.0.0.5.1|2.0.6.9.8.4.0.1.0.6|3.3.8.3.5.7.9|2.0.6.3.3.8.6.0.6.1|2.0.6 .2.0.2.2.0.3.3/ describe BRIAN_PHONE_NUMBERS Phone number or address pulled from spam scoreBRIAN_PHONE_NUMBERS 5.5 - Message - Good day, A Gen_uine Coll`ege Deg.ree in 2 weeks Cal_l us now_!-> 2*0*6*984-2327 Within 2 weeks! No Study Required! 1_0_0_% Veri.fiable! Right now the following deg.rees are being offered: B/A, .B/S/C,.M/A,.M/S/C,.M/B/A, .P/H/D, C.al_l us now_ for more information, 2*0*6*984-2327 TTYL, Vilma Milton
Re: relaydb and tarpit
On Donnerstag, 13. April 2006 18:15 mouss wrote: > pfff. just reading the two first paragraphs is enough to look > elsewhere. some people seem to redefine what a false positive is. I didn't mean that, I meant the tarpitting approach. Of course you have to set some (much) harder policy on which systems to put on your tarpit-blackhole list. But *if* you have such a "tarpit decider without FP" (not sure how to do that...), couldn't this be a very good countermeasure to spam? mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpBPBcyjNZ8J.pgp Description: PGP signature
Re: xxxl spam
On Apr 13, 2006, at 12:12 AM, Loren Wilton wrote: I'd like to venture the suggestion that the percentage of spam from XP isn't necessarily an indication of inherent buggyness. It is more an indication that it is an OS for Clueless Noobs who haven't a clue about maintaining a system, avoiding a virus, or even able to tell if they have a viruis. Thes are the machines that turn into zombies. While I don't disagree with your assessment of XP systems, I have a different hunch about why such a large percentage of the mail coming from XP systems is spam, and a smaller percentage of mail coming from the other systems is spam: a) In general, XP systems are not servers, and therefore, are not mail servers. b) Due to (a), if you do your mail/spam/virus scanning on machines that do not receive direct connections from your own clients (mail/spam/virus scanning at the border), OR if you do not have a high percentage of XP clients in your domain, then your scanning systems will not receive many (if any) legitimate direct connections from XP clients ... because a legitimate mail sending process on an XP system will be directly connecting to their own domain's mail server, and not to YOUR mail scanning systems. c) Thus, if you meed the conditions in (b), and if we accept (a) as true, then the vast majority of connections you receive from XP systems, on your mail scanning systems, will be from spam/virus bots trying to directly submit spam or virus laden messages to your mail gateways instead of submitting it to their own mail servers (as bots are known to do). We would expect to see a lower percentage of spam from server type OSes (or OSes that can be clients or servers) because a higher percentage of those platforms are used as legitimate mail servers. The other factor here is: while I _hate_ linux, how much of the spam being submitted by linux boxes is merely a mail server relaying on behalf of one of their infected clients? (same with the unix systems, and the 2000/2003 systems) And thus not at all indicative of the quality of linux systems administration out on the internet. I think this is one of those cases where "the statistics work as blind observations of behavior, but attempting to describe _why_ the statistics works is not something you can sum up with a simple an straight forward explanation". Kinda like QM.
Re: relaydb and tarpit
Michael Monnerie wrote: Sorry for x-posting, but that's a program useful to postfix and/or SA users. http://www.benzedrine.cx/relaydb.html Does anybody use or know about this program with tarpitting? It sounds very interesting, and for the author it seems to work, but I'd like to know if others made good or bad experience with it. After all, we're all fighting spammers, and if there are solutions really working, I'm ready to implement it into our servers. pfff. just reading the two first paragraphs is enough to look elsewhere. some people seem to redefine what a false positive is. they think that just because they reject mail or because the client/sender/... misbehaves, then it's not a false positive. This is just silly. a false positive is when a classifier considers a legitimate mail as spam, be that by rejection, by discarding, by delivering to a junk folder, ... etc. just say no...
Re: TEXTAREA style="visibility: hidden"
Matt Kettler wrote: Matthias Keller wrote: Matt Kettler wrote: Magnus Holmgren wrote: I see a fair amount of spam using to hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor at SARE. It certainly seems worth testing. Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long): rawbody L_STYLE_HIDDEN /]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i describe L_STYLE_HIDDEN has text with hidden visibility style score L_STYLE_HIDDEN 0.1 I added some allowance for other declarations in the textarea tag, and the insertion of whitespace at various spots... It may need further tweaking/tuning, but it's a first-stab. Hi Matt I'm using this rule for quite some time now: rawbody MKE_HIDDEN1 /<[^>]*\bstyle=[^>]*(?:visibility:\s*hidden|display:\s*none)/i describeMKE_HIDDEN1 Contains CSS-hidden text score MKE_HIDDEN1 3.5 That seems to be a nicer rule. My only concern would be that <[^>]* could be rather slow. I'd change the * to a range-limit, to prevent SA from digging through the entire body of a message that happens to be text/plain and starts off with a < and has no > anywhere in it. Good idea Thanks for pointing that out Maybe a meta rule with IS_HTML or how that's called again might be a good idea too Let me know your mass check results then Matt
Re: TEXTAREA style="visibility: hidden"
Matthias Keller wrote: > Matt Kettler wrote: >> Magnus Holmgren wrote: >> >>> I see a fair amount of spam using to hide bayes poison. Shouldn't a rule against that, or >>> CSS-hidden text in general, be worthwile? I couldn't find any in the >>> default 3.1.1 ruleset, nor at SARE. >>> >> >> It certainly seems worth testing. >> >> Here's a rule I wrote (caution: word-wraps.. this should be 3 lines >> long): >> >> rawbody L_STYLE_HIDDEN /> [^>]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i >> describe L_STYLE_HIDDEN has text with hidden visibility style >> score L_STYLE_HIDDEN 0.1 >> >> I added some allowance for other declarations in the textarea tag, and >> the >> insertion of whitespace at various spots... >> >> It may need further tweaking/tuning, but it's a first-stab. >> > Hi Matt > > I'm using this rule for quite some time now: > > rawbody MKE_HIDDEN1 > /<[^>]*\bstyle=[^>]*(?:visibility:\s*hidden|display:\s*none)/i > describeMKE_HIDDEN1 Contains CSS-hidden text > score MKE_HIDDEN1 3.5 > That seems to be a nicer rule. My only concern would be that <[^>]* could be rather slow. I'd change the * to a range-limit, to prevent SA from digging through the entire body of a message that happens to be text/plain and starts off with a < and has no > anywhere in it.
Proper use of user_prefs "whitelist"
I've been having some difficulty with the user_prefs and the whitelist_* fucntions. I read the examples etc, and I believe these are correct, but clearly certain email is still being tagged (see below). I wonder if someone can help clarify what I'm doing wrong here. First, here are the directives in my ~/.spamassassin/user_prefs file, as it applies to this instance: whitelist_from_rcvd spamassassin.apache.org hermes.apache.org whitelist_from *.apache.org Here is the Sendmail log, showing the rejection: Apr 13 11:52:24 mail sm-mta[34951]: k3DFqNBR034951: from=<[EMAIL PROTECTED]>, size=17514, class=-60, nrcpts=1, msgid=<[EMAIL PROTECTED]>, proto=SMTP, daemon=MTA, relay=hermes.apache.org [209.237.227.199] Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter add: header: X-Spam-Flag: YES Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter add: header: X-Spam-Status: Yes, score=9.0 required=5.0 tests=HTML_00_10,HTML_MESSAGE,\n\tJ_CHICKENPOX_12,J_CHICKENPOX_33,RCVD_IN_SORBS,SARE_BIZOP,\n\tSARE_COLLEGE_SCAM,TVD_FUZZY_DEGREE autolearn=no version=3.1.1 Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: Milter: data, reject=550 5.7.1 Blocked by SpamAssassin Apr 13 11:52:26 mail sm-mta[34951]: k3DFqNBR034951: to=<[EMAIL PROTECTED]>, delay=00:00:02, pri=155514, stat=Blocked by SpamAssassin Thanks in advance
Re: TEXTAREA style="visibility: hidden"
On Thu, Apr 13, 2006 at 03:58:01PM +0200, Magnus Holmgren wrote: > I see a fair amount of spam using to > hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in > general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor > at SARE. Not specific to textarea, just looking for an html tag with that style setting: 0.878 0.9903 0.33190.749 0.001.00 TVD_VIS_HIDDEN Specifically just looking for textarea: 0.821 0.9903 0.1.000 1.001.00 TVD_VIS_HIDDEN I added the second one to my sandbox. We'll see how the nightly mass-checks deal with it. :) Thanks! :) -- Randomly Generated Tagline: "Do not meddle in the affairs of wizards, for they are subtle and quick to anger."- Lord of the Rings pgpQ8Oyqqmvgy.pgp Description: PGP signature
Russian Spam
I have received several copies of a spam message that is in Russian (I think it's Russian). I get maybe 1 or 2 a week. I wish I could block all Russian messages, but we are a University and could easily have Russian students. I am unable to read this message and therefore have no ideas on how to block this. Can anyone help me out with suggestions? I apologize if this has been discussed in the last week. I haven't had time to catch up on list messages over the last couple of days and didn't see anything skimming the subjects of recent threads. Thanks, Kris Message with full headers below: Microsoft Mail Internet Headers Version 2.0 Received: from gateway3.oc.edu ([205.143.222.12]) by fsmail.oc.edu with Microsoft SMTPSVC(6.0.3790.211); Thu, 13 Apr 2006 08:50:17 -0500 Received: from ip-189.net-82-216-33.toulouse.rev.numericable.fr ([82.216.33.189])(helo=ip-189.net-82-216-33.toulouse.rev.numericable.fr) by gateway3.oc.edu with smtp (Exim 4.54) id 1FU2CH-0008JS-AY for [EMAIL PROTECTED]; Thu, 13 Apr 2006 08:49:43 -0500 From: "Litvinova Elena" <[EMAIL PROTECTED]> To: "Samusenko Tat'jana" <[EMAIL PROTECTED]> Date: Thu, 13 Apr 2006 13:50:06 + Message-ID: <[EMAIL PROTECTED]> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="koi8-r"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1441 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 X-SA-Exim-Connect-IP: 82.216.33.189 X-SA-Exim-Rcpt-To: [EMAIL PROTECTED] X-SA-Exim-Mail-From: [EMAIL PROTECTED] X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on gateway3.oc.edu X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=DNS_FROM_AHBL_RHSBL,RELAY_FR autolearn=disabled version=3.1.0 Subject: Re[6]: =?koi8-r?B?9Nkgzc7Px88gxMzRIM3FztEg2s7B3snb2A==?= davavsheju X-SA-Exim-Version: 4.2 (built Thu, 03 Mar 2005 10:44:12 +0100) X-SA-Exim-Scanned: Yes (on gateway3.oc.edu) Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 13 Apr 2006 13:50:17.0572 (UTC) FILETIME=[32A1FA40:01C65F01] Рад Вас снова видеть! Вы собираетесь в США? Хотите свободно работать с технической документацией? Расширить свой кругозор? Центр Американского Английского приглашает выучить английский язык!!! Все стадии обучения - от нуля до высшего. Ассоциативно- образная методика. Преподаватели из США. Без больших скидок не уйдёте! :) Наши телефоны в Москве: 105 пять-один-восемь-шесть два-три-восемь-три-три-восемь-шесть Не хотите получать информацию от Центра? Отправьте свой адрес нам: [EMAIL PROTECTED] сил. Но он не мог понять того, -- вдруг как бы вырвавшимся тонким голосом закричал князь Андрей, -- но он не мог понять, что мы в первый раз дрались там за русскую землю, что в войсках был такой дух, какого никогда я не видал, что мы два дня сряду отбивали французов и что этот успех удесятерял наши силы. Он велел отступать, и все усилия и потери пропали даром. Он не думал об измене, он старался все сделать как можно лучше, он все обдум от этого-то он и не годится. Он не годится теперь именно потому, что он все обдумывает очень основательно и аккуратно, как и следует всякому немцу. Как бы тебе сказать... Ну, у отца твоего немец-лакей, и он прекрасный лакей и удовлетворит всем его нуждам лучше тебя, и пускай он служит; но ежели отец при смерти болен, ты прогонишь лакея и своими непривычными, неловкими станешь ходить за отцом и лучше успокоишь его, чем искусный, но чужой человек. Так и сделали с Барклаем. Пока Россия была здорова, ей мог служить
Re: TEXTAREA style="visibility: hidden"
Matt Kettler wrote: Magnus Holmgren wrote: I see a fair amount of spam using to hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor at SARE. It certainly seems worth testing. Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long): rawbody L_STYLE_HIDDEN /]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i describe L_STYLE_HIDDEN has text with hidden visibility style score L_STYLE_HIDDEN 0.1 I added some allowance for other declarations in the textarea tag, and the insertion of whitespace at various spots... It may need further tweaking/tuning, but it's a first-stab. Hi Matt I'm using this rule for quite some time now: rawbody MKE_HIDDEN1 /<[^>]*\bstyle=[^>]*(?:visibility:\s*hidden|display:\s*none)/i describeMKE_HIDDEN1 Contains CSS-hidden text score MKE_HIDDEN1 3.5 In my opinion you shouldn't limit it to textareas as I've seen them on DIVs and others too... So to me, any visibility:hidden or display:none is suspect as I dont see any legitimate use in emails In my spams, this rule matches around 4% of all spams, I haven't seen any ham matches yet Feel free to mass check it and/or include it into your coding rules. But if you do please inform me that I can remove my local copy then. Matt
Re: TEXTAREA style="visibility: hidden"
Bowie Bailey wrote: > JD Smith wrote: >> So, what exactly is bayes poison? > > "Bayes poison" is a collection of random words or text selections that > have nothing to do with the email subject and are only there in an > attempt to confuse the Bayes database. This doesn't really work the > way the spammers would like to think it does, but they keep doing it > anyway. How well bayes poison works depends a lot on your "bayes" implementation. Some "bayes" implementations are fairly susceptible to this. (I put "bayes" in quotes because not all bayes implementations are really Bayesian at all. Actually, most are not, including SA.) In particular, the choice of combining algorithm seems to matter a lot. The use of chi-squared combining, instead of true Bayesian combining, seems to make SA's bayes rather resistant to this. (note: the use of chi-squared is not exclusive to SA.. many "bayes" implementations do this, but not all.) Another area of influence is the choice of tokens. Words vs chars, hapaxes, etc all change how a bayes implementation reacts to poisoning attempts. So spammers keep using bayes poison because it works in some cases. It also doesn't really hurt them much, and sometimes even helps them, against more resistant implementations.
SpamAssassin BZ downtime
http://ajax.apache.org/%7ejefft/ : Bugzilla is moving to a new host, and is temporarily down while the database synchs. Apologies for the inconvenience. --j.
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote: > Agreed, this rule is completely inappropriate, it penalizes valid > encoding according to RFC 2047 and fires on any lengthier Subject > line in non-English language. It should disappear or have a > much reduced default score. Says you. ;) 1.047 1.4619 0.07920.949 0.580.89 SUBJECT_ENCODED_TWICE So in the results used to generate scores, that rule is ~94.9% accurate, and hits ~1.46% of all spam. In a recent nightly mass-check run: 1.153 1.4173 0.11510.925 0.730.89 SUBJECT_ENCODED_TWICE So more ham seems to use encoding twice in the subject, and a little less spam uses it. Based on this, my guess is the generated score would go down. The thing to remember about rules is that they neither necessarily look for RFC non-compliance, nor do they avoid RFC compliant mails. They look for features that hit spam and try to avoid hitting ham. The key there is that rule development occurs with the results people make available. If the people generating results don't receive ham mails that, for instance, use multiple encodings in a Subject header, the results won't indicate that it occurs in ham very much. -- Randomly Generated Tagline: "I protect home plate like a mormon girl on prom night." - Mimi on the Drew Carey show pgp7GImSPz38Z.pgp Description: PGP signature
RE: TEXTAREA style="visibility: hidden"
JD Smith wrote: > > So, what exactly is bayes poison? "Bayes poison" is a collection of random words or text selections that have nothing to do with the email subject and are only there in an attempt to confuse the Bayes database. This doesn't really work the way the spammers would like to think it does, but they keep doing it anyway. -- Bowie
RE: TEXTAREA style="visibility: hidden"
So, what exactly is bayes poison? Best regards, JD Smith -Original Message- From: Magnus Holmgren [mailto:[EMAIL PROTECTED] Sent: Thursday, April 13, 2006 8:58 AM To: users@spamassassin.apache.org Subject: TEXTAREA style="visibility: hidden" I see a fair amount of spam using to hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor at SARE. -- Magnus Holmgren
Re: TEXTAREA style="visibility: hidden"
Magnus Holmgren wrote: > I see a fair amount of spam using to > hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in > general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor > at SARE. It certainly seems worth testing. Here's a rule I wrote (caution: word-wraps.. this should be 3 lines long): rawbody L_STYLE_HIDDEN /]{0,50}style\s?=\s?"\s?visibility:\s?hidden\s?"[^>]{0,50}>/i describe L_STYLE_HIDDEN has text with hidden visibility style score L_STYLE_HIDDEN 0.1 I added some allowance for other declarations in the textarea tag, and the insertion of whitespace at various spots... It may need further tweaking/tuning, but it's a first-stab.
TEXTAREA style="visibility: hidden"
I see a fair amount of spam using to hide bayes poison. Shouldn't a rule against that, or CSS-hidden text in general, be worthwile? I couldn't find any in the default 3.1.1 ruleset, nor at SARE. -- Magnus Holmgren pgpVmoewWW2XX.pgp Description: PGP signature
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
On Donnerstag, 13. April 2006 13:35 Mark Martinec wrote: > Agreed, this rule is completely inappropriate, it penalizes valid > encoding according to RFC 2047 and fires on any lengthier Subject > line in non-English language. It should disappear or have a > much reduced default score. The problem seems to be that 1) most spam is english 2) most people contributing mass-checks are english speaking 3) therefore most ham+spam tested in mass-checks are english in order to improve the situation, more mass-check testers with non-english language ham+spam should contribute, see http://wiki.apache.org/spamassassin/MassCheck?highlight=%28mass%29 I'm not a SA dev, but I think they once wrote more supporters would be nice. I do mass-checks, and if somebody wants to help, I have a working script you can have in order to contribute to testing. It's a simple setup, and then your server has some work to do overnight. On mine, it's about 1 hour per night, so pas problem. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpRDuDm470m7.pgp Description: PGP signature
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
Kai Schaetzl wrote: > > I just saw that a normal Ebay outbid notice hit two high-score rules. One > > is from sare-spoof and I already contacted the maintainer. But one is in > > the default 3.1.1 ruleset and I think this rule should get completely > > removed or get a score of 0. It's > > 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice Alan Premselaar: > This utterly wreaks havoc on just about all Japanese email, so I dropped > the score to nearly nothing. Agreed, this rule is completely inappropriate, it penalizes valid encoding according to RFC 2047 and fires on any lengthier Subject line in non-English language. It should disappear or have a much reduced default score. Mark
Re: xxxl spam
Mark Martinec wrote: I guess Windows Server 2003 is reported as Windows 2000, but I don't know. Certainly a couple of very large sites are seen as Windows 2000. In the UNKNOWN category there must be a mix of Windows and Unix hosts, not sure what is unusual about them. Mark Hmm... FWIW: [EMAIL PROTECTED] dos]$ sudo p0f -i eth1 p0f - passive os fingerprinting utility, version 2.0.4 (C) M. Zalewski <[EMAIL PROTECTED]>, W. Stearns <[EMAIL PROTECTED]> p0f: listening (SYN) on 'eth1', 223 sigs (12 generic), rule: 'all'. 24.141.168.241:4218 - Windows XP Pro SP1, 2000 SP3 -> 66.98.221.156:25 (distance 1, link: ethernet/modem) 66.98.221.156:2602 - Windows 2000 SP4, XP SP1 -> 24.141.168.241:783 (distance 19, link: ethernet/modem) 24.141.168.241 is Windows XP Pro SP1 66.98.221.156 is Windows Server 2003 SP1 (Standard Edition) Daryl
Re: xxxl spam
Wolfgang, Loren, > > real mail servers (those that deliver the ham part of mail) rarely ever > > run XP but that this OS is the best candidate for creating a spam zombie > Not completely unreasonable. XP is targeted within MS as a personal or > very small company OS. The equivalent of a linux/unix system used by more > than a single person would typically be some version of Server 2003. Which > was probably identified in the stats as Windows 2000. > > I'd like to venture the suggestion that the percentage of spam from XP > isn't necessarily an indication of inherent buggyness. It is more an > indication that it is an OS for Clueless Noobs who haven't a clue about > maintaining a system, avoiding a virus, or even able to tell if they have a > viruis. Thes are the machines that turn into zombies. I fully agree. In this view the following two lines should be seen as well: p0f OS guessham : spam Linux58.8 % : 41.2 % Unix 80.3 % : 19.7 % Linux is used by masses (compared to other Unix OS types) because it is considered to be easier to set up. Eventually this also means that less care is invested in prevention of being used to propagate spam. Still, a "score L_P0F_Unix -1.0" seems to be doing a good job here. Daryl, > I'm not sure the ham hit rate from the Windows-XP category scales (to > other installations) very well. The last time I looked into using p0f > to fingerprint connecting hosts, last spring, I seem to recall that > Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint > identically. > > While it'd be nice to be score "Windows-XP" hosts harshly, there's a lot > of mail coming from Windows Server 2003 hosts that would get hit. There is indeed a handful of valid small sites classified by p0f as Windows XP from which we do receive regular mail (well, newsletters and such, but still, should be treated mostly as ham). I don't see adding few score points to them much different than other (some quite arbitrary) rules - each rule tries to have low FP rate, but it often is not zero. Only a collection of all rules has merit. > I know for some of my systems 1:99 would be really low if Windows Server > 2003 and XP are identified the same. 40:60 (and in some cases 80:20) > would be closer to what I often see if I were to assume that all spam > came from Windows XP hosts. > Maybe you don't receive much, if any, mail from Windows Server 2003 hosts? I guess Windows Server 2003 is reported as Windows 2000, but I don't know. Certainly a couple of very large sites are seen as Windows 2000. In the UNKNOWN category there must be a mix of Windows and Unix hosts, not sure what is unusual about them. Mark
relaydb and tarpit
Sorry for x-posting, but that's a program useful to postfix and/or SA users. http://www.benzedrine.cx/relaydb.html Does anybody use or know about this program with tarpitting? It sounds very interesting, and for the author it seems to work, but I'd like to know if others made good or bad experience with it. After all, we're all fighting spammers, and if there are solutions really working, I'm ready to implement it into our servers. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpX4owGiqKRK.pgp Description: PGP signature
Re: sa missed to scan some of email
David B Funk engineering.uiowa.edu> writes: > Exactly so. > Usually you can find the related message by matching the time-stamp > from your maillog to your spamd log. You can also do some detective work, > eliminate maillog entries that have an incoming msgid (IE one from the > sending MTA) and just concentrate on those that have a locally added > msgid. > > Dave > thx help, it seem ur correct, as based on the timestamp search, most of unknown msgid at spam.log had a msgid like '[EMAIL PROTECTED]' at maillog.
Re: xxxl spam
> to read this in other words: while certain analysts (and definitlely microsoft marketing) > claim that about 50 % of all servers is running windows, these figures tend to say that > real mail servers (those that deliver the ham part of mail) rarely ever run XP > but that this OS is the best candidate for creating a spam zombie Not completely unreasonable. XP is targeted within MS as a personal or very small company OS. The equivalent of a linux/unix system used by more than a single person would typically be some version of Server 2003. Which was probably identified in the stats as Windows 2000. I'd like to venture the suggestion that the percentage of spam from XP isn't necessarily an indication of inherent buggyness. It is more an indication that it is an OS for Clueless Noobs who haven't a clue about maintaining a system, avoiding a virus, or even able to tell if they have a viruis. Thes are the machines that turn into zombies. If there were as many linux machines in the hands of Clueless Noobs, I'd bet that the number of infected linux systems would be in the similar percentage range. Remember, these XP systems are virtually all run with Administrator (aka root) privs all the time, by people that haven't a clue what that means. What would happen if all linux-like systems ran that way?) Loren
Re: Rawbody rules information
Nigel Marshall wrote: > Hi List, > > I am looking to understand more about the raw body rules, and examples > of them that I could follow to hopefully write a few for myself. Can > someone point in a good place to start or a good tutorial on this sort > of thing? A rawbody rule is pretty much the same as a body rule. The difference being that HTML tags are still present, and newlines are present. http://wiki.apache.org/spamassassin/WritingRules That said, do you really need to write a rawbody rule? Are you sure a body or uri rule won't do instead? I generally try to avoid writing rawbody rules unless I need to write something that falls into one of these tow categories: 1) a examines HTML tags directly (and not just the target of a URI) 2) examines newline insertion patterns.
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kai Schaetzl wrote: > I just saw that a normal Ebay outbid notice hit two high-score rules. One > is from sare-spoof and I already contacted the maintainer. But one is in > the default 3.1.1 ruleset and I think this rule should get completely > removed or get a score of 0. It's > > 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice > > From grepping the rules it does what it says: it checks if there are two > B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or > at all? This is absolutely valid Q/B encoding and actually *required* by > RFC if your subject line is longer than 80 (or was it 72?) characters > (minus the encoding, so it's actually more like a 60 raw character limit). > This rule will hit on *lots* of non-ASCII mail and on almost all mail > coming from Ebay Germany. > > There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which > are "similar". QP scores 0 and BASE64 scores 0.449. This is much more > reasonable. > > Kai > This utterly wreaks havoc on just about all Japanese email, so I dropped the score to nearly nothing. alan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEPfgmE2gsBSKjZHQRAt82AKDAY4xTmST0kaY5cje1xH1ScDajOACg6fMH msifLKqJuv1IpudxbKGDcfQ= =ZDQE -END PGP SIGNATURE-