Re: Spam and the Internet [Was: xxxl spam]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matt Kettler wrote: ...snip... Here's one, if you want to see it: http://mywebpages.comcast.net/mkettler/spam.jpg There's pretty close to zero chance that anyone in the US is going to hop on a plane and fly to Guatemala to buy ordinary lawn care products from a small store. But that's the kind of ads I'm getting. but they've got heart-shaped pancake molds... you wouldn't fly to guatamala for that? and at Q.29?! what a bargain! (heh, i couldn't resist) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEQ0keE2gsBSKjZHQRAjkKAJ9AnC7vS409cSYvoyczXPpK9NNa9QCgtZsb 68xY13eQIvXXLSrkT996/hM= =rejD -END PGP SIGNATURE-
Re: Non-English languages (was: xxxl spam)
On Apr 13, 2006, at 9:46 PM, Kenneth Porter wrote: On Thursday, April 13, 2006 10:32 PM -0600 Paul R. Ganci [EMAIL PROTECTED] wrote: Unfortunately I am still a linguistic idiot and only speak English ... a Buffalo, NY version at that! My grand parents came over from Italy in 1920 and promptly stopped speaking Italian around my parents. It forced my parents to learn English at the cost of never learning Italian. There is plently of room to accomodate two languages but neither the US education system or home life is set up to do it. Same here. I took a couple years of high school Spanish in California and the classes dragged so incredibly slowly that I learned just a little vocabulary and the most basic of grammar, and still led the class. I usually finished my physics homework in that class while waiting for everyone to catch up. As a programmer I envy my professional peers who can speak Japanese and other non-European languages. My interest in programming languages extends to natural languages, and I find their differences fascinating. To those of you who've successfully learned 2nd and 3rd languages as an adult, what do you recommend for accomplishing that? I wish I had stuck with German in HS. And I wish I had taken the time to learn Latin and/or Greek back when I had all of that free time on my hands in HS. These days, it seems like everyone* ought to know (in addition to English) Spanish, and then a choice of French, Chinese, or Japanese. (* in the US, I don't mean globally; globally, I'd probably say that we should all know 3 out of those 5, but that's just me making wild-a*s-suggestions for a world that doesn't care about my opinion ;-) ) And, reiterating Kenneth's question: Anyone have advice for an almost middle-aged person who wants to go about expanding his natural language capabilities? (Hmm.. that's probably a dumb question for me.. I think all of those are taught at the university where I work... and can take free classes; could add Italian, Latin, and Greek too...; still for everyone who doesn't work for a University, but who has a similar thought, it's a good question to ponder)
Re: xxxl spam
mouss wrote: and I've got plenty of users that speak multiple languages, not all of which use plain-ascii. I guess so. now I'm not sure our situation isn't worst because people tried to find non standard solutions that are still used. I still remember the days when some customers were asking us to fix our software because it broke their accents... hopefully these times are gone, but I still see broken mail (much more than I should). actually, I also see mail that doesn't get rendered correctly on thunderbird. so I'll admit that the issue isn't really about accented chars... This is a real sore point for me. I worked on the Mime quoted-printable encoding 14 years ago, and in some ways we haven't come nearly as far as we should have (see my posts as [EMAIL PROTECTED] when I was at France Telecom). A lot of it has to do with idiots like Microsoft pushing competing standards (like Windows-1251) that offer no advantage whatsoever over their established standards (like ISO Latin-1) and serve only to increase the exponential problem of interoperability matrices... the number of ways each agent must be tested against other agents, etc... thereby guaranteeing that complete testing of all possible permutations becomes an unattainable goal receding ever more quickly towards the horizon Where we could have been smart and limited ourselves to a manageable and very finite set of permutations instead... This is why our site has the following rule: # don't allow windows-125x text attachments... mimeheader __CTYPE_MH_WIN1252 Content-Type =~ /charset=\windows-125[0-8]\/i meta L_WIN_CHARSET ((__CTYPE_MH_HTML || __CTYPE_MH_TEXT_PLAIN) __CTYPE_MH_WIN1252) describe L_WIN_CHARSET Content-Type is Windows-specific text score L_WIN_CHARSET 0.1 should probably do the same for non-MIME content, but it's not as much of a problem since Outlook prefers MIME content. If anyone wants to talk to us, they can stick with ISO Latin-1. We don't need no stinkin' Windows-125x... (or -839 for that matter). -Philip
Re: Non-English languages (was: xxxl spam)
On Thu, 2006-04-13 at 23:38, John Rudd wrote: And, reiterating Kenneth's question: Anyone have advice for an almost middle-aged person who wants to go about expanding his natural language capabilities? There was an article in Newsweek a few weeks back about language immersion vacations. Here's the related msnbc story: http://msnbc.msn.com/id/11481528/ -Roger
Re: xxxl spam
On Freitag, 14. April 2006 06:32 Paul R. Ganci wrote: Start young when it is easy for kids to pick up the sounds. Yes, my daughter has the advantage of learning german with me, french with my wife, and later at school she will learn english anyway. Still, people in Belgium have it more easy: in addition to en,de,fr, they learn dutch and their local flavor, a mix of all languages (which dutch is already anyways). The most funny party concerning languages I had was on Crete (and island of Greece): It was a party where all the tourist guides were, about 20 people and at least 9 different languages, where each could speak at least 2, often 4... now that's a mess :-) mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpfa5aGhC2IS.pgp Description: PGP signature
Re: Non-English languages (was: xxxl spam)
[2006-04-14 08:38:46] John Rudd, I wish to start by greeting the list; I am a recent addition and I have been lurking for the past two weeks. You guys already make enough traffic. :-) JR And, reiterating Kenneth's question: Anyone have advice for an almost JR middle-aged person who wants to go about expanding his natural language JR capabilities? I am an Esperanto speaker. There are many reasons to give it a try. These are pretty much universally accepted: For one, it's quite simple to learn, for those who already know an indo-european language; after a couple months you'll be able to sustain a decent conversation. It also helps recognizing and understanding other languages. There have been experiments on this. There are also social and personal reasons. I won't enter into this, though. If you are really interested you'll find them out by yourself. I will only say that I found the language really intriguing, very expressing, and fun. A few pointers, http://www.esperanto.se/dok/praguemanifesto.html http://www.lernu.net Cordialità / Best regards / Gxis la Manuel Giorgini [EMAIL PROTECTED], Programmatore INTERLOGICA e-business solutions - http://www.interlogica.net Via Fusinato, 27 - IT 30171 Mestre VE - Italia - Unione Europea Tel +39 041 099 30 00 (6 linee r.a.) - Fax +39 041 504 11 72
Re: Non-English languages (was: xxxl spam)
On Freitag, 14. April 2006 06:46 Kenneth Porter wrote: To those of you who've successfully learned 2nd and 3rd languages as an adult, what do you recommend for accomplishing that? There are books called Assimil, because you just assimilate the language with them, learning in a very natural way by speaking full sentences from the beginning. It looks very complicated first, but is really quite easy then. I've managed to learn greek in a very short time with it, and now I'm struggling with french (which is quite hard though). http://www.assimil.com/ mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: lynx -source http://zmi.at/zmi3.asc | gpg --import // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpKoEMvVF1dI.pgp Description: PGP signature
Re: Non-English languages (was: xxxl spam)
[2006-04-14 06:46:51] Kenneth Porter, KP To those of you who've successfully learned 2nd and 3rd languages as an KP adult, what do you recommend for accomplishing that? As soon as you finish the basic/intermediate courses, find a penpal, or more than one, as soon as you can. With the Internet it's quite easy. A friend of mine picks out foreign people willing to learn Italian and they help each other this way. There are websites set up for this, if I'm not mistaken. Practising really helps. Cordialità / Best regards / Gxis la Manuel Giorgini [EMAIL PROTECTED], Programmatore INTERLOGICA e-business solutions - http://www.interlogica.net Via Fusinato, 27 - IT 30171 Mestre VE - Italia - Unione Europea Tel +39 041 099 30 00 (6 linee r.a.) - Fax +39 041 504 11 72
Re: xxxl spam
On Apr 14, 2006, at 12:40 AM, Michael Monnerie wrote: On Freitag, 14. April 2006 06:32 Paul R. Ganci wrote: Start young when it is easy for kids to pick up the sounds. Yes, my daughter has the advantage of learning german with me, french with my wife, and later at school she will learn english anyway. Still, people in Belgium have it more easy: in addition to en,de,fr, they learn dutch and their local flavor, a mix of all languages (which dutch is already anyways). The most funny party concerning languages I had was on Crete (and island of Greece): It was a party where all the tourist guides were, about 20 people and at least 9 different languages, where each could speak at least 2, often 4... now that's a mess :-) My favorite story isn't that extreme. It's about a friend of mine who went and did his senior year of HS in study abroad. He had learned German in HS, but was sent to Denmark (close) and spent that year learning the language. When he came back, there was this big party thing in Washington DC for all of the exchange students going in both directions. He came back to the US not having spoken any English for a year, and was put in a hotel room with someone who had been in Germany not speaking English for a year, and a German who had been speaking only English for a year. So, none of them was entirely comfortable going back to speaking their native language yet, none of them had been speaking the same language as the other two during that year ... and they stayed up all night talking. At first, each just spoke the language they had been speaking for the year, and the other two just understood. I think Daniel said that by morning, he was speaking English again :-}
Re: Non-English languages (was: xxxl spam)
On Thu, 2006-04-13 at 23:38, John Rudd wrote: And, reiterating Kenneth's question: Anyone have advice for an almost middle-aged person who wants to go about expanding his natural language capabilities? There was an article in Newsweek a few weeks back about language immersion vacations. Here's the related msnbc sto: http://msnbc.msn.com/id/11481528/ -Roger
Re: xxxl spam
Hi, to read this in other words: while certain analysts (and definitlely microsoft marketing) claim that about 50 % of all servers is running windows, these figures tend to say that real mail servers (those that deliver the ham part of mail) rarely ever run XP but that this OS is the best candidate for creating a spam zombie Wolfgang Hamann p0f OS guessham : spam - Windows-XP0.7 % : 99.3 % Windows-2000 5.8 % : 94.2 % UNKNOWN 16.5 % : 83.5 % Linux58.8 % : 41.2 % Unix 80.3 % : 19.7 % (Unix+Linux 66.5 % : 33.5 %) Only 0.7% of all mail coming from Windows-XP hosts is ham!!! It is an ideal information to contribute two or three score points.
Re: xxxl spam
Mark Martinec wrote: The most interesting part in my view is not the IP distance, but the type of OS, illustrated by the following table (derived from the same data as fig2): p0f OS guessham : spam - Windows-XP0.7 % : 99.3 % Windows-2000 5.8 % : 94.2 % UNKNOWN 16.5 % : 83.5 % Linux58.8 % : 41.2 % Unix 80.3 % : 19.7 % (Unix+Linux 66.5 % : 33.5 %) Only 0.7% of all mail coming from Windows-XP hosts is ham!!! It is an ideal information to contribute two or three score points. I'm not sure the ham hit rate from the Windows-XP category scales (to other installations) very well. The last time I looked into using p0f to fingerprint connecting hosts, last spring, I seem to recall that Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint identically. While it'd be nice to be score Windows-XP hosts harshly, there's a lot of mail coming from Windows Server 2003 hosts that would get hit. I know for some of my systems 1:99 would be really low if Windows Server 2003 and XP are identified the same. 40:60 (and in some cases 80:20) would be closer to what I often see if I were to assume that all spam came from Windows XP hosts. Maybe you don't receive much, if any, mail from Windows Server 2003 hosts? Daryl
Re: xxxl spam
to read this in other words: while certain analysts (and definitlely microsoft marketing) claim that about 50 % of all servers is running windows, these figures tend to say that real mail servers (those that deliver the ham part of mail) rarely ever run XP but that this OS is the best candidate for creating a spam zombie Not completely unreasonable. XP is targeted within MS as a personal or very small company OS. The equivalent of a linux/unix system used by more than a single person would typically be some version of Server 2003. Which was probably identified in the stats as Windows 2000. I'd like to venture the suggestion that the percentage of spam from XP isn't necessarily an indication of inherent buggyness. It is more an indication that it is an OS for Clueless Noobs who haven't a clue about maintaining a system, avoiding a virus, or even able to tell if they have a viruis. Thes are the machines that turn into zombies. If there were as many linux machines in the hands of Clueless Noobs, I'd bet that the number of infected linux systems would be in the similar percentage range. Remember, these XP systems are virtually all run with Administrator (aka root) privs all the time, by people that haven't a clue what that means. What would happen if all linux-like systems ran that way?) Loren
Re: xxxl spam
Wolfgang, Loren, real mail servers (those that deliver the ham part of mail) rarely ever run XP but that this OS is the best candidate for creating a spam zombie Not completely unreasonable. XP is targeted within MS as a personal or very small company OS. The equivalent of a linux/unix system used by more than a single person would typically be some version of Server 2003. Which was probably identified in the stats as Windows 2000. I'd like to venture the suggestion that the percentage of spam from XP isn't necessarily an indication of inherent buggyness. It is more an indication that it is an OS for Clueless Noobs who haven't a clue about maintaining a system, avoiding a virus, or even able to tell if they have a viruis. Thes are the machines that turn into zombies. I fully agree. In this view the following two lines should be seen as well: p0f OS guessham : spam Linux58.8 % : 41.2 % Unix 80.3 % : 19.7 % Linux is used by masses (compared to other Unix OS types) because it is considered to be easier to set up. Eventually this also means that less care is invested in prevention of being used to propagate spam. Still, a score L_P0F_Unix -1.0 seems to be doing a good job here. Daryl, I'm not sure the ham hit rate from the Windows-XP category scales (to other installations) very well. The last time I looked into using p0f to fingerprint connecting hosts, last spring, I seem to recall that Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint identically. While it'd be nice to be score Windows-XP hosts harshly, there's a lot of mail coming from Windows Server 2003 hosts that would get hit. There is indeed a handful of valid small sites classified by p0f as Windows XP from which we do receive regular mail (well, newsletters and such, but still, should be treated mostly as ham). I don't see adding few score points to them much different than other (some quite arbitrary) rules - each rule tries to have low FP rate, but it often is not zero. Only a collection of all rules has merit. I know for some of my systems 1:99 would be really low if Windows Server 2003 and XP are identified the same. 40:60 (and in some cases 80:20) would be closer to what I often see if I were to assume that all spam came from Windows XP hosts. Maybe you don't receive much, if any, mail from Windows Server 2003 hosts? I guess Windows Server 2003 is reported as Windows 2000, but I don't know. Certainly a couple of very large sites are seen as Windows 2000. In the UNKNOWN category there must be a mix of Windows and Unix hosts, not sure what is unusual about them. Mark
Re: xxxl spam
Mark Martinec wrote: I guess Windows Server 2003 is reported as Windows 2000, but I don't know. Certainly a couple of very large sites are seen as Windows 2000. In the UNKNOWN category there must be a mix of Windows and Unix hosts, not sure what is unusual about them. Mark Hmm... FWIW: [EMAIL PROTECTED] dos]$ sudo p0f -i eth1 p0f - passive os fingerprinting utility, version 2.0.4 (C) M. Zalewski [EMAIL PROTECTED], W. Stearns [EMAIL PROTECTED] p0f: listening (SYN) on 'eth1', 223 sigs (12 generic), rule: 'all'. 24.141.168.241:4218 - Windows XP Pro SP1, 2000 SP3 - 66.98.221.156:25 (distance 1, link: ethernet/modem) 66.98.221.156:2602 - Windows 2000 SP4, XP SP1 - 24.141.168.241:783 (distance 19, link: ethernet/modem) 24.141.168.241 is Windows XP Pro SP1 66.98.221.156 is Windows Server 2003 SP1 (Standard Edition) Daryl
Re: xxxl spam
On Apr 13, 2006, at 12:12 AM, Loren Wilton wrote: I'd like to venture the suggestion that the percentage of spam from XP isn't necessarily an indication of inherent buggyness. It is more an indication that it is an OS for Clueless Noobs who haven't a clue about maintaining a system, avoiding a virus, or even able to tell if they have a viruis. Thes are the machines that turn into zombies. While I don't disagree with your assessment of XP systems, I have a different hunch about why such a large percentage of the mail coming from XP systems is spam, and a smaller percentage of mail coming from the other systems is spam: a) In general, XP systems are not servers, and therefore, are not mail servers. b) Due to (a), if you do your mail/spam/virus scanning on machines that do not receive direct connections from your own clients (mail/spam/virus scanning at the border), OR if you do not have a high percentage of XP clients in your domain, then your scanning systems will not receive many (if any) legitimate direct connections from XP clients ... because a legitimate mail sending process on an XP system will be directly connecting to their own domain's mail server, and not to YOUR mail scanning systems. c) Thus, if you meed the conditions in (b), and if we accept (a) as true, then the vast majority of connections you receive from XP systems, on your mail scanning systems, will be from spam/virus bots trying to directly submit spam or virus laden messages to your mail gateways instead of submitting it to their own mail servers (as bots are known to do). We would expect to see a lower percentage of spam from server type OSes (or OSes that can be clients or servers) because a higher percentage of those platforms are used as legitimate mail servers. The other factor here is: while I _hate_ linux, how much of the spam being submitted by linux boxes is merely a mail server relaying on behalf of one of their infected clients? (same with the unix systems, and the 2000/2003 systems) And thus not at all indicative of the quality of linux systems administration out on the internet. I think this is one of those cases where the statistics work as blind observations of behavior, but attempting to describe _why_ the statistics works is not something you can sum up with a simple an straight forward explanation. Kinda like QM.
Re: xxxl spam
John Rudd wrote: While I don't disagree with your assessment of XP systems, I have a different hunch about why such a large percentage of the mail coming from XP systems is spam, and a smaller percentage of mail coming from the other systems is spam: a) In general, XP systems are not servers, and therefore, are not mail servers. b) Due to (a), if you do your mail/spam/virus scanning on machines that do not receive direct connections from your own clients (mail/spam/virus scanning at the border), OR if you do not have a high percentage of XP clients in your domain, then your scanning systems will not receive many (if any) legitimate direct connections from XP clients ... because a legitimate mail sending process on an XP system will be directly connecting to their own domain's mail server, and not to YOUR mail scanning systems. c) Thus, if you meed the conditions in (b), and if we accept (a) as true, then the vast majority of connections you receive from XP systems, on your mail scanning systems, will be from spam/virus bots trying to directly submit spam or virus laden messages to your mail gateways instead of submitting it to their own mail servers (as bots are known to do). We would expect to see a lower percentage of spam from server type OSes (or OSes that can be clients or servers) because a higher percentage of those platforms are used as legitimate mail servers. The other factor here is: while I _hate_ linux, how much of the spam being submitted by linux boxes is merely a mail server relaying on behalf of one of their infected clients? (same with the unix systems, and the 2000/2003 systems) And thus not at all indicative of the quality of linux systems administration out on the internet. I think this is one of those cases where the statistics work as blind observations of behavior, but attempting to describe _why_ the statistics works is not something you can sum up with a simple an straight forward explanation. Kinda like QM. ot I agree that statistics aren't the whole story. you can study the percentage of thiefs/criminals based on skin color and origin (some people already do it, and many jump to conclusions without studies). but you can do the same study based on social situation and past history of people. the first researcher will probably conclude that black/arabic/latin/... people are more criminal. the second researcher will instead conclude that criminality is more seen in poor communities, but that these aren't the worst criminals (killing vs stealing for instance). /ot back to xp and co. my feeling (no, I didn't run a study and won't) is that even if any study would show that we get more spam from XP than from linux, I will not use this to classify my mail. I am certain that if you do stats on mail date, you'll find that some dates correspond to more spam than others. we've already seen people jumping to block specific mailers (the bat for instance) based on their stats. I am also seing many legit mail trigering some SA rules (*_exess, no_real_name, x_library, ...). when I see this, I check the rule, and if I can't find a justification, I disable it.
Re: xxxl spam
On Apr 13, 2006, at 9:56 AM, mouss wrote: I am also seing many legit mail trigering some SA rules (*_exess, no_real_name, x_library, ...). when I see this, I check the rule, and if I can't find a justification, I disable it. I wouldn't do that. Just because legitimate mail triggers some rule doesn't mean that the rule is flawed. Using your example, triggering no_real_name does not mean that the message is spam, it means that the message has _some_ similarity to at least some spam messages (the higher the score, the stronger the similarity). And, that's absolutely true: statistically, when looking at the corpus which was used to create the rules database, a higher percentage of no_real_name messages were spam. Now, if legit messages were not just triggering those rules, but also triggering enough rules to be flagged as spam ... then I would lower the value of those rules, but not disable those rules. But I would only do that if I could see that there was a large percentage of should-be-ham messages being flagged as spam by that rule AND that rule wasn't being useful in flagging spam messages. The reason is: if the message is being flagged, but it shouldn't have been, then perhaps my corpus of messages differs significantly enough from the SA internal corpus that my score values need to be different. But that doesn't mean that the rules are so disjoint from tracking spam that they should be entirely disabled. They just don't have the same weighting that my corpus needs. If, instead, most messages passing through my mail servers, that triggered that rule, really did seem to be spam, then I wouldn't alter the score at all. I would just pass the should-have-been-ham message into my bayesian learner and hope that a low bayes score for messages like that would offset the rules had flagged it as spam.
Re: xxxl spam
mouss wrote: I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. Sounds like we need more non-us based corpus contributors. After all, the SA devs can only work with what they get. Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the US. Last I checked he was in Ireland. Unfortunately this doesn't help with the encoding issue, as they still use ordinary English characters over there for most things. (I don't think Gaelic is very common in email.) So bear in mind that SA isn't just developed in the US by US citizens for US markets. However, it is true that the vast majority of the corpus currently comes from folks who speak English (King's or Yankee) as a primary language, and that's a bit of a problem as it creates considerable bias in the rules. And even us US folks do have encoding issues. After all, English is not our official language here in the US, and I've got plenty of users that speak multiple languages, not all of which use plain-ascii.
Re: xxxl spam
Matt Kettler wrote: mouss wrote: I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. Sounds like we need more non-us based corpus contributors. After all, the SA devs can only work with what they get. Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the US. Last I checked he was in Ireland. Unfortunately this doesn't help with the encoding issue, as they still use ordinary English characters over there for most things. (I don't think Gaelic is very common in email.) So bear in mind that SA isn't just developed in the US by US citizens for US markets. oh, I never meant that. However, it is true that the vast majority of the corpus currently comes from folks who speak English (King's or Yankee) as a primary language, and that's a bit of a problem as it creates considerable bias in the rules. And even us US folks do have encoding issues. After all, English is not our official language here in the US, what do you mean here? what would be your official language? and I've got plenty of users that speak multiple languages, not all of which use plain-ascii. I guess so. now I'm not sure our situation isn't worst because people tried to find non standard solutions that are still used. I still remember the days when some customers were asking us to fix our software because it broke their accents... hopefully these times are gone, but I still see broken mail (much more than I should). actually, I also see mail that doesn't get rendered correctly on thunderbird. so I'll admit that the issue isn't really about accented chars...
Re: xxxl spam
mouss wrote: However, it is true that the vast majority of the corpus currently comes from folks who speak English (King's or Yankee) as a primary language, and that's a bit of a problem as it creates considerable bias in the rules. And even us US folks do have encoding issues. After all, English is not our official language here in the US, what do you mean here? what would be your official language? The United States of America does not have any official language. Americanized English is our common language, but it's not official. This means that our government has to supply forms and materials in many languages for its citizens, because it cannot require that citizens speak English. For example, we have tax forms in French: http://www.irs.gov/pub/irs-access/f2290fr_accessible.pdf Admittedly non-english forms and services are somewhat secondary here, but they are present. and I've got plenty of users that speak multiple languages, not all of which use plain-ascii. I guess so. now I'm not sure our situation isn't worst because people tried to find non standard solutions that are still used. I still remember the days when some customers were asking us to fix our software because it broke their accents... hopefully these times are gone, but I still see broken mail (much more than I should). actually, I also see mail that doesn't get rendered correctly on thunderbird. so I'll admit that the issue isn't really about accented chars... Well, yours is certainly worse, or at least more prevalent, than the problem here in the US, but I would not say it's the worst. Generally speaking the worst case seems to be present in smaller Asian nations, which have really extensive use of non-us characters. At least the French can restrict their text to the same character set as English and still be readable, although awkward due to the screwed up accents. Also, smaller Asian nations still to this day have a high prevalence of locally-grown mail clients, many of which are not even remotely RFC compliant, but work well with others in the same locale. They're also much more likely to make use of mixed-language text containing many character sets. Speaking 2 or 3 different languages is fairly common in the smaller countries of the Asian region, just due to necessity for trade with neighboring countries. Another area with this same basic issue would be the middle-east, but the number of completely different character sets is smaller.
Re: xxxl spam
On Apr 13, 2006, at 11:40 AM, mouss wrote: Matt Kettler wrote: And even us US folks do have encoding issues. After all, English is not our official language here in the US, what do you mean here? what would be your official language? The US doesn't have an official language. By default, it is assumed to be English for most things, but it's not Official. And, in some regions within the US, official govt signs and documents come in various languages (the reasons why this is true has to do with liability and legality; since there's no official language, you can't just pick _one_ language to publish your forms in, and be done with it; if you do, you're neglecting significant minority populations (and in some regions, those can be quite significant, such as spanish speakers in southern Florida or southern California), which then makes you vulnerable to law suits saying that you're discriminating and/or being negligent toward those significant minorities who aren't required to speak English, because English isn't an official language). In order to simplify this, some states have tried to enact official language legislation. Florida tried it. Someone put Make English the official state language on a ballot. The Cuban-American population in southern Florida got mad, and put Make Spanish the official state language on the ballot. Neither one passed, but the Spanish one got more votes. This pretty much silenced the English as state language movement in Florida, as their plan almost backfired on them. I don't remember any other state trying it since. The states where there wouldn't be any opposition don't need to make it a law ... and in states like California where it could matter (reducing costs in govt overhead by eliminating multiple languages and the requirement for multilingual workers), the English as state language supporters are afraid of what almost happened in Florida. So ... sorry for the long winded explanation, but that's what he was saying.
Re: xxxl spam
states like California where it could matter (reducing costs in govt overhead by eliminating multiple languages and the requirement for multilingual workers), the English as state language supporters are afraid of what almost happened in Florida. Considering that at last census a minority of 54% of California residents spoke Spanish as their primary or only language... I predict that the US will be the first country in the 21th century to abandon English as the national language, while almost all other countries seem to be mandating that their citizens learn English. Loren
Re: xxxl spam
Loren Wilton wrote: I predict that the US will be the first country in the 21th century to abandon English as the national language, while almost all other countries seem to be mandating that their citizens learn English. Loren The problem with the US is that we are linguistic idiots (a quote from Columbia University German Professor). If you go to Europe in general they speak at least two languages fluently. English and the country's native language. I have had the opportunity to work in both Geneva, Switzerland and and Milan, Italy. All business is conducted in English and everything else in Italian or in the case of Switzerland either German, Swiss German or French. Essentially all the engineers with whom I worked could speak two languages or in some cases four. I don't know what the big deal is. It shouldn't be one language but at least two here in the US. Start young when it is easy for kids to pick up the sounds. Unfortunately I am still a linguistic idiot and only speak English ... a Buffalo, NY version at that! My grand parents came over from Italy in 1920 and promptly stopped speaking Italian around my parents. It forced my parents to learn English at the cost of never learning Italian. There is plently of room to accomodate two languages but neither the US education system or home life is set up to do it. -- Paul ([EMAIL PROTECTED])
Non-English languages (was: xxxl spam)
On Thursday, April 13, 2006 10:32 PM -0600 Paul R. Ganci [EMAIL PROTECTED] wrote: Unfortunately I am still a linguistic idiot and only speak English ... a Buffalo, NY version at that! My grand parents came over from Italy in 1920 and promptly stopped speaking Italian around my parents. It forced my parents to learn English at the cost of never learning Italian. There is plently of room to accomodate two languages but neither the US education system or home life is set up to do it. Same here. I took a couple years of high school Spanish in California and the classes dragged so incredibly slowly that I learned just a little vocabulary and the most basic of grammar, and still led the class. I usually finished my physics homework in that class while waiting for everyone to catch up. As a programmer I envy my professional peers who can speak Japanese and other non-European languages. My interest in programming languages extends to natural languages, and I find their differences fascinating. To those of you who've successfully learned 2nd and 3rd languages as an adult, what do you recommend for accomplishing that?
Re: xxxl spam
Theo Van Dinter writes: On Tue, Apr 11, 2006 at 02:14:26PM -0400, Matt Kettler wrote: Well, SA automatically ignores attachments in recent versions. However, hash-based plugins like razor, dcc, and pyzor work best when seeing all the attachments. For completeness, the first sentence isn't exactly true. SA automatically ignores attachments for the standard set of body, header, and uri rules, but it still has to read in the data, store it in the message tree internally, and make the entire message text available for full rules. There are also things like the AntiVirus plugin, etc, which may go ahead and decode attachments and do things with the data. I could easily see a plugin for ClamAV, or something scanning image files, etc. I think that at some point, the default size could go up, but I wouldn't try it for now. Matt Sergeant had a good trick in the qpsmtpd SpamAssassin plugin iirc -- it would download the entire message, but after a certain point (e.g. 250k) it would stop writing the incoming data to memory, and instead flush the remainder to a temporary file on disk. That way it could keep only the first 250k of messages, scanning that part, and once complete, reassemble the whole message as it wrote it back out. However there may be issues there -- e.g. consider a multipart/alternative message containing an innocent-looking 600k text/plain, followed by a 10k text/html spam payload. Common MUAs would display the latter, SpamAssassin would scan the former. That seems to be a vulnerability to me, although we already don't scan large messages _anyway_ ;) Also as Theo said, it fails in the face of any kind of message-body rewriting by SpamAssassin. --j.
Re: xxxl spam
That's excellent data! Mind if I forward that around to another list or two? The hops measurement is particularly interesting. Have you got that implemented as a working rule, in the field? is it expensive? --j. Mark Martinec writes: mouss wrote: since most filters skip large messages, it may be tempting for spammers to send large messagess: I did some statistical analysis few weeks ago with SA 3.1.1 (SA called from amavisd-new, but that is beside the point). Please see: http://www.ijs.si/software/amavisd/fig4.gif Shows spam score vs. mail size as a scattergram http://www.ijs.si/software/amavisd/fig5.gif Shows elapsed time for mail checking vs. mail size (shown is total time, but 90% of it reflects processing within SA and its plugins) As a curiosity (but off topic), harvesting results from p0f (passive operating system fingerprinting), here are two more: http://www.ijs.si/software/amavisd/fig1.gif Spam score vs. IP distance in hops (our server is in European academic network Geant) And perhaps most interesting of all (by again OT): http://www.ijs.si/software/amavisd/fig2.gif Spam score distribution as a percentage of all mail, separate by each sending mail client's operating system. Mark
Re: Spam and the Internet [Was: xxxl spam]
Matt Kettler writes: These spams I get from .gt don't offer any kind of online ordering. They are ads that you'd have to physically travel to the store in Guatemala to take advantage of them. They're ordinary weekly sales fliers for an ordinary local store that's so small that only 6 cars can park in front of it. (They have pictures of the store in some of them). Delivered to my mailbox as 1/2 meg .jpg files. It's really quite bizarre, and amusing. Here's one, if you want to see it: http://mywebpages.comcast.net/mkettler/spam.jpg wow. 3 of the cars are photoshopped in, btw. ;) --j.
Re: xxxl spam
Justin, Mark Martinec writes: As a curiosity (but off topic), harvesting results from p0f (passive operating system fingerprinting), here are two more: http://www.ijs.si/software/amavisd/fig1.gif Spam score vs. IP distance in hops (our server is in European academic network Geant) And perhaps most interesting of all (by again OT): http://www.ijs.si/software/amavisd/fig2.gif Spam score distribution as a percentage of all mail, separate by each sending mail client's operating system. That's excellent data! Mind if I forward that around to another list or two? I don't mind. The hops measurement is particularly interesting. Have you got that implemented as a working rule, in the field? is it expensive? Yes, implemented in the field - comes with the latest amavisd-new-2.4.0. It inserts one header field with collected information into mail header, making it available to SA to score it as it wishes (custom rules, bayes). It could probably just as well be implemented as a SA plugin (making use of the supplied lightweight p0f-analyzer.pl interface to p0f), but it was easier for me to do it in amavisd-new, where remote SMTP client's IP address is accessible directly, not needing to parse header and understand topology. It is reasonably inexpensive: cost of running p0f utility is comparable to running tcpdump, it takes about one hour CPU per month on our medium-busy mailer, the rest is negligible, no additional latencies and no additional network traffic. The most interesting part in my view is not the IP distance, but the type of OS, illustrated by the following table (derived from the same data as fig2): p0f OS guessham : spam - Windows-XP0.7 % : 99.3 % Windows-2000 5.8 % : 94.2 % UNKNOWN 16.5 % : 83.5 % Linux58.8 % : 41.2 % Unix 80.3 % : 19.7 % (Unix+Linux 66.5 % : 33.5 %) Only 0.7% of all mail coming from Windows-XP hosts is ham!!! It is an ideal information to contribute two or three score points. Traffic from own PC clients must not be seen by p0f, otherwise one would be penalizing site's own user. This can be achieved by either separating MSA from MTA, or using list of internal IP networks for exclusion. A quick summary from amavisd-new-2.4.0 release notes: - experimental support for passive operating system fingerprinting with the use of externally running utility p0f, supplying collected information as a header field to SpamAssassin, making possible to add rules to score SMTP client hosts based on educated guess about their operating system type and IP distance; see below for details; Here are the installation details: - passive operating-system fingerprinting (p0f) support lets SA gain information about SMTP client's operating system and estimated IP distance, and can reduce the number of bounces: * find and install the p0f utility: http://lcamtuf.coredump.cx/p0f.shtml or in FreeBSD ports collection as 'net-mgmt/p0f'; * start a p0f process on the same host where MTA (MX) is running, making it listen only to incoming TCP sessions (to reduce its workload) to the IP address and TCP port (25) where MTA is accepting incoming mail from outside (it doesn't hurt to let it see other traffic too, it just isn't needed); after testing p0f alone and seeing that it works, you may start it up, feeding its output to program p0f-analyzer.pl that comes with amavisd-new package, e.g.: p0f -l 'tcp dst port 25' 21 | p0f-analyzer.pl 2345 on multi-homed boxes one may need to specify interface and IP address where MTA is listening, the filter syntax is the same as in tcpdump, e.g.: p0f -l -i bge0 'dst host 192.0.2.66 and tcp dst port 25' 21 \ | p0f-analyzer.pl 2345 * the program p0f-analyzer.pl reads p0f reports on stdin, keeps a cache for a limited time (10 minutes, configurable) of data about incoming TCP sessions organized by remote IP address, and listens on UDP port 2345 (specified as its command line argument) for queries; only queries from allowed IP addresses are accepted and responded to, other queries are silently ignored - configure @inet_acl accordingly, defaults to 127.0.0.1; * adding the following line to amavisd.conf, matching the chosen port number to the one specified on the command line to the p0f-analyzer.pl: $os_fingerprint_method = 'p0f:127.0.0.1:2345'; makes amavisd send queries to p0f-analyzer.pl (on the supplied IP address and UDP port number) to collect information about remote SMTP client's OS; collected response is then supplied as a header field when SpamAssassin is invoked; query/response is very quick and imposes no burden on amavisd process nor does its extend its processing time. The $os_fingerprint_method setting is also a member of policy banks to make it more flexible to disable fingerprinting for
xxxl spam
since most filters skip large messages, it may be tempting for spammers to send large messagess: - using a large but invisible part (either by using mime and putting a large text part in an alternative mime, or using invisible chars before their own text). - using a large image - large tail (spammers can append anything). - unused attachments questions: - has this already been seen? - how can we mitigate this? my first thought would be to process the message before passing it to the filter. In particular, are there drawbacks/benefits if I remove attachments before passing them to SA (or any other filter)?
Re: xxxl spam
mouss wrote: since most filters skip large messages, it may be tempting for spammers to send large messagess: - using a large but invisible part (either by using mime and putting a large text part in an alternative mime, or using invisible chars before their own text). - using a large image - large tail (spammers can append anything). - unused attachments questions: - has this already been seen? I've not seen it with dummy text, but I have seen the large image spam. However, it's very rare. The problem being that if you're a large-volume spammer, large messages take a longer time to send, and thus reduce your spams/minute. There's only one spammer that's done this to me. There's some group of stores in Guatemala that sends me high-res scans of their newspaper. Consejeros en Finanzas Empresariales, some kind of bank La Cuacao - some kind of electronics shop? or an eye doctor? cefesa hardware - a True Value hardware store. Why anyone in Guatemala thinks I'll visit their store to spend Q. 22 on a patio log fake fire log or Q. 85 on a generic brand weed and feed fertilizer is beyond me. But other than these guys, I don't get any spams 250kb. - how can we mitigate this? Personally, I think it is largely self-mitigating. Their size greatly limits their potential distribution. As I see it, there's very little large-spam out there. my first thought would be to process the message before passing it to the filter. In particular, are there drawbacks/benefits if I remove attachments before passing them to SA (or any other filter)? Well, SA automatically ignores attachments in recent versions. However, hash-based plugins like razor, dcc, and pyzor work best when seeing all the attachments.
Re: xxxl spam
On Tue, Apr 11, 2006 at 02:14:26PM -0400, Matt Kettler wrote: Well, SA automatically ignores attachments in recent versions. However, hash-based plugins like razor, dcc, and pyzor work best when seeing all the attachments. For completeness, the first sentence isn't exactly true. SA automatically ignores attachments for the standard set of body, header, and uri rules, but it still has to read in the data, store it in the message tree internally, and make the entire message text available for full rules. There are also things like the AntiVirus plugin, etc, which may go ahead and decode attachments and do things with the data. I could easily see a plugin for ClamAV, or something scanning image files, etc. I think that at some point, the default size could go up, but I wouldn't try it for now. -- Randomly Generated Tagline: Zoidberg: That's where I'm meeting Uncle Zoid for lunch to discuss my Hollywood dream. The next time you see me, don't be surprised if I've eaten. pgp3qHm0nZQ6E.pgp Description: PGP signature
Re: xxxl spam
Theo Van Dinter wrote: On Tue, Apr 11, 2006 at 02:14:26PM -0400, Matt Kettler wrote: Well, SA automatically ignores attachments in recent versions. However, hash-based plugins like razor, dcc, and pyzor work best when seeing all the attachments. For completeness, the first sentence isn't exactly true. SA automatically ignores attachments for the standard set of body, header, and uri rules, but it still has to read in the data, store it in the message tree internally, and make the entire message text available for full rules. Fair enough... There are also things like the AntiVirus plugin, etc, which may go ahead and decode attachments and do things with the data. I could easily see a plugin for ClamAV, or something scanning image files, etc. I think that at some point, the default size could go up, but I wouldn't try it for now. FWIW, it might be worth considering the approach used by MailScanner. MailScanner still scans large messages, but truncates messages over Max SpamAssassin Size. Presumably it does in a manner that still has the correct mime boundaries, because I don't get any kind of superflous rule hits regarding mime boundaries on large messages. I've currently got this set to 60k, but MailScanner defaults to 30k. Of course, this can't work if you're using any kind of encapsulation options in report_safe, but since MailScanner does all the markup itself, it doesn't hurt it to send Mail::SpamAssassin a truncated version. Converting this to the spamc/spamd model might be kind of difficult due to this, but it's worth considering for spamc -c.
Re: xxxl spam
On Tue, Apr 11, 2006 at 02:46:41PM -0400, Matt Kettler wrote: Of course, this can't work if you're using any kind of encapsulation options in report_safe, but since MailScanner does all the markup itself, it doesn't hurt it to send Mail::SpamAssassin a truncated version. Converting this to the spamc/spamd model might be kind of difficult due to this, but it's worth considering for spamc -c. It's been suggested before, but it doesn't quite work for SA unfortunately. SA is designed to be a generic mail filter, and some rules/plugins/etc expect to be able to see the entire original contents of the message, so we can't really trim off pieces. Also, things like spamc have no concept of what a message actually is, they just read in a bunch of data and send it somewhere, so the full message would have to be read in by spamd before anything could be trimmed off of it. At that point there's not a lot of savings in trimming off attachments (though the raw versions could potentially be stored in temp files instead of memory). And then, as you said, with encapsulation and such, we'd need the whole of the message anyway. -- Randomly Generated Tagline: NT is secure as long as you don't remove the shrink wrap. - G. Myers pgpcoEruv9FrR.pgp Description: PGP signature
Re: xxxl spam
Theo Van Dinter wrote: On Tue, Apr 11, 2006 at 02:46:41PM -0400, Matt Kettler wrote: Of course, this can't work if you're using any kind of encapsulation options in report_safe, but since MailScanner does all the markup itself, it doesn't hurt it to send Mail::SpamAssassin a truncated version. Converting this to the spamc/spamd model might be kind of difficult due to this, but it's worth considering for spamc -c. It's been suggested before, but it doesn't quite work for SA unfortunately. SA is designed to be a generic mail filter, and some rules/plugins/etc expect to be able to see the entire original contents of the message, so we can't really trim off pieces. Also, things like spamc have no concept of what a message actually is, they just read in a bunch of data and send it somewhere, so the full message would have to be read in by spamd before anything could be trimmed off of it. At that point there's not a lot of savings in trimming off attachments (though the raw versions could potentially be stored in temp files instead of memory). And then, as you said, with encapsulation and such, we'd need the whole of the message anyway. Agreed.. the only part of sa that this would be straightforward for would be spamc -c. At that point, spamc isn't piping the message back out, and isn't doing encapsulation, so truncation would be irrelevant.
Re: xxxl spam
mouss wrote: since most filters skip large messages, it may be tempting for spammers to send large messagess: I did some statistical analysis few weeks ago with SA 3.1.1 (SA called from amavisd-new, but that is beside the point). Please see: http://www.ijs.si/software/amavisd/fig4.gif Shows spam score vs. mail size as a scattergram http://www.ijs.si/software/amavisd/fig5.gif Shows elapsed time for mail checking vs. mail size (shown is total time, but 90% of it reflects processing within SA and its plugins) As a curiosity (but off topic), harvesting results from p0f (passive operating system fingerprinting), here are two more: http://www.ijs.si/software/amavisd/fig1.gif Spam score vs. IP distance in hops (our server is in European academic network Geant) And perhaps most interesting of all (by again OT): http://www.ijs.si/software/amavisd/fig2.gif Spam score distribution as a percentage of all mail, separate by each sending mail client's operating system. Mark
relay distance and spam [was xxxl spam]
Mark Martinec wrote: http://www.ijs.si/software/amavisd/fig1.gif Spam score vs. IP distance in hops (our server is in European academic network Geant) This one is amazing. there seems to be an empty space (most mail has nhops = 10 or = 14). I would guess that most ham wih large nhops is from mailing lists. so the question is what would be the graphic if you take into account: - mailing lists forwarding - multiple internal hops at either sender or receiver (I have N Received headers added by my own MTA. and for mail fetched from an MSP, there are still more). I would conjecture that most legitimate mail has two real hops (the sending MTA and the receiving MTA).
RE: relay distance and spam [was xxxl spam]
mouss wrote: I would conjecture that most legitimate mail has two real hops (the sending MTA and the receiving MTA). That would be one hop.
Re: xxxl spam
On Tuesday, April 11, 2006 2:14 PM -0400 Matt Kettler [EMAIL PROTECTED] wrote: I've not seen it with dummy text, but I have seen the large image spam. However, it's very rare. The problem being that if you're a large-volume spammer, large messages take a longer time to send, and thus reduce your spams/minute. You can also impose this cost on spammers by enabling the GreetPause feature in the more recent versions of sendmail. This tells sendmail not to answer right away when receiving a connection, and to drop the connection if anything is received before the greeting is sent out. This punishes slammer spammers who push the whole SMTP conversation through and then disconnect. It also ensures that every connection from an unknown sender takes a minimum amount of time. You can add exceptions in your access database for your customers and frequent correspondents. For example, this exception drops the GreetPause to zero for my LAN (example is for 10.123/16): GreetPause:10.123 0
greetpause was Re: xxxl spam
Kenneth Porter wrote: You can also impose this cost on spammers by enabling the GreetPause feature in the more recent versions of sendmail. This tells sendmail not to answer right away when receiving a connection, and to drop the connection if anything is received before the greeting is sent out. This punishes slammer spammers who push the whole SMTP conversation through and then disconnect. It also ensures that every connection from an unknown sender takes a minimum amount of time. You can add exceptions in your access database for your customers and frequent correspondents. For example, this exception drops the GreetPause to zero for my LAN (example is for 10.123/16): GreetPause:10.123 0 Is this as effective as greylisting? -- Mr Michele Neylon Blacknight Solutions Quality Business Hosting Colocation http://www.blacknight.ie/ Tel. 1850 927 280 Intl. +353 (0) 59 9183072 Direct Dial: +353 (0)59 9183090 Fax. +353 (0) 59 9164239
Re: greetpause was Re: xxxl spam
You can also impose this cost on spammers by enabling the GreetPause feature in the more recent versions of sendmail. This tells sendmail not to answer right away when receiving a connection, and to drop the connection if anything is received before the greeting is sent out. This punishes slammer spammers who push the whole SMTP conversation through and then disconnect. It also ensures that every connection from an unknown sender takes a minimum amount of time. You can add exceptions in your access database for your customers and frequent correspondents. For example, this exception drops the GreetPause to zero for my LAN (example is for 10.123/16): GreetPause:10.123 0 Is this as effective as greylisting? Perhaps not, but it also doesn't have any of the drawbacks (ie, delayed mail, need to whitelist non-behaving servers, etc.). I recently enabled it on my servers, and it's been stopping a ton of mail without any complaints from legitimate senders.
Re: greetpause was Re: xxxl spam
Mike Jackson wrote: You can also impose this cost on spammers by enabling the GreetPause feature in the more recent versions of sendmail. This tells sendmail not to answer right away when receiving a connection, and to drop the connection if anything is received before the greeting is sent out. This punishes slammer spammers who push the whole SMTP conversation through and then disconnect. It also ensures that every connection from an unknown sender takes a minimum amount of time. You can add exceptions in your access database for your customers and frequent correspondents. For example, this exception drops the GreetPause to zero for my LAN (example is for 10.123/16): GreetPause:10.123 0 Is this as effective as greylisting? Perhaps not, but it also doesn't have any of the drawbacks (ie, delayed mail, need to whitelist non-behaving servers, etc.). I recently enabled it on my servers, and it's been stopping a ton of mail without any complaints from legitimate senders. greetpause only blocks some ratware spam. If I was to write spam and/or viruses, I would just add a sleep(x): given N victims, choose M among them: for i=0; iM; i++ time(i) = connect to server(i) for i=0; iM; i++ t = now - time(i) + min_sleep sleep(t) send_junk() (can be tuned if using multiple threads/processes). so greetpause will certainly stop some ratware spam, but is not a full solution. also, if your greetpause requires sleep()-ing on every connection, then it's not acceptable (for me) as this is a call for DoS. I am not aware of any async MTA [read: one that will not sleep, but will handle other connections in the meantime], at least in the open source world. If you are after miscreants, then partial-greylisting is probably more effective (I mean greylisting some of the connections, based on the client name, ip, behaviour, ... etc).
Re: relay distance and spam [was xxxl spam]
[EMAIL PROTECTED] wrote: mouss wrote: I would conjecture that most legitimate mail has two real hops (the sending MTA and the receiving MTA). That would be one hop. depends on how you count: MUA - my MTA1 - your MTA - your mailbox that's two MTAs, so that's two hops. I prefer to count it this way because this corresponds to Received headers. a direct mail would be MUA - MTA - mailbox and is either: - legitimate from trusted sources - direct spam - an exception if you have an internal MTA and a relay host, or if you have an MTA and relay via an ISP, that adds a hop If you can remove the reception hops (since you know them, you can ignore them in your computations), most legitimate cross-domain mail would be 2-h mail (this is what I believe).
RE: greetpause was Re: xxxl spam
mouss wrote: so greetpause will certainly stop some ratware spam, but is not a full solution. Agreed. Spammers have access to all the free CPU bandwidth and processing time they can steal - legitimate MTAs are limited to a budget. Any anti-spam solution that simply rewards CPU and bandwidth spent* is playing into the hands of the spammers. * Email stamps, factor this product of large primes challanges, greetpause -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
Re: relay distance and spam [was xxxl spam]
mouss wrote: - multiple internal hops at either sender or receiver (I have N Received headers added by my own MTA. and for mail fetched from an MSP, there are still more). Actually, if I'm reading this right, it's the number of IP hops between the sending server and the receiving server -- in other words, how many lines you'd see if you were on the receiving server and ran traceroute to the sending MTA. I've rarely seen any messages that passed through more than 5 MTAs -- certainly not enough to account for the graph. But 10 routers between me and the sender? That doesn't seem unreasonable at all. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: relay distance and spam [was xxxl spam]
Am Dienstag, 11. April 2006 22:28 schrieb mouss: [EMAIL PROTECTED] wrote: mouss wrote: I would conjecture that most legitimate mail has two real hops (the sending MTA and the receiving MTA). That would be one hop. depends on how you count: MUA - my MTA1 - your MTA - your mailbox that's two MTAs, so that's two hops. I prefer to count it this way because this corresponds to Received headers. well, here it looks like this: MUA - senders MTA - my external MTA -(fetchmail)- my internal MTA - one internal hop through spamassassin - one internal hop through antivirus - my MUA and at my workplace its a similar setup, without the fetchmail. bye, MH
RE: relay distance and spam [was xxxl spam]
Kelson wrote: Actually, if I'm reading this right, it's the number of IP hops between the sending server and the receiving server -- in other words, how many lines you'd see if you were on the receiving server and ran traceroute to the sending MTA. Ah... that makes much more sense :) -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
Spam and the Internet [Was: xxxl spam]
Matt Kettler wrote: There's only one spammer that's done this to me. There's some group of stores in Guatemala that sends me high-res scans of their newspaper. Consejeros en Finanzas Empresariales, some kind of bank La Cuacao - some kind of electronics shop? or an eye doctor? cefesa hardware - a True Value hardware store. Why anyone in Guatemala thinks I'll visit their store to spend Q. 22 on a patio log fake fire log or Q. 85 on a generic brand weed and feed fertilizer is beyond me. dunno, but I can tell you that the net if full of people who love me and want me good. I keep winning all the lotteries. I can buy software at cheap prices (if someone can tell these guys that I have nor the time nor the need to use photoshop, that I already have windows+office, ... etc, that may save them some time/resources they can spend helping me in other areas:). others seems to need an urgent contact for an important relationship. I'm feeling like the ceo of a large company. Some even seem to know private infos about me. It seems I need some special pills. but for now, the names of the pills seem to change all the time. I'll wait until they get an agreement on how to name them:). They also keep talking about inches. if someone can tell them that we use the metric system here, I would be grateful...
Re: Spam and the Internet [Was: xxxl spam]
mouss wrote: Matt Kettler wrote: Why anyone in Guatemala thinks I'll visit their store to spend Q. 22 on a patio log fake fire log or Q. 85 on a generic brand weed and feed fertilizer is beyond me. dunno, but I can tell you that the net if full of people who love me and want me good. I keep winning all the lotteries. I can buy software at cheap prices (if someone can tell these guys that I have nor the time nor the need to use photoshop, that I already have windows+office, ... etc, that may save them some time/resources they can spend helping me in other areas:). others seems to need an urgent contact for an important relationship. I'm feeling like the ceo of a large company. Some even seem to know private infos about me. It seems I need some special pills. but for now, the names of the pills seem to change all the time. I'll wait until they get an agreement on how to name them:). They also keep talking about inches. if someone can tell them that we use the metric system here, I would be grateful... Yeah, but all those actually have some chance of financial gain from someone located in another country. You'd have to be stupid, but it's possible, because all of those can close an electronic transaction with you. These spams I get from .gt don't offer any kind of online ordering. They are ads that you'd have to physically travel to the store in Guatemala to take advantage of them. They're ordinary weekly sales fliers for an ordinary local store that's so small that only 6 cars can park in front of it. (They have pictures of the store in some of them). Delivered to my mailbox as 1/2 meg .jpg files. It's really quite bizarre, and amusing. Here's one, if you want to see it: http://mywebpages.comcast.net/mkettler/spam.jpg There's pretty close to zero chance that anyone in the US is going to hop on a plane and fly to Guatemala to buy ordinary lawn care products from a small store. But that's the kind of ads I'm getting.
Re: relay distance and spam [was xxxl spam]
On Tuesday April 11 2006 23:17, Kelson wrote: mouss wrote: - multiple internal hops at either sender or receiver (I have N Received headers added by my own MTA. and for mail fetched from an MSP, there are still more). Actually, if I'm reading this right, it's the number of IP hops between the sending server and the receiving server -- in other words, how many lines you'd see if you were on the receiving server and ran traceroute to the sending MTA. Exactly. It is usually the number of hops a traceroute running on MTA would show when tracing route to the host from which it is receiving a message. (I say usually, because routes can be asymmetric, and we are actually observing a remaining TTL field value in the IP packet, taking into account an educated guess on the initial setting, based on detected OS type). Btw, a horizontal spread of 1 unit (in fig1) is an artificial white noise added to spread numerous dots somewhat for a better view. I guess we are somewhat lucky seeing a rather clearcut separation of nearby friendly and distant wild-world hosts, and can use IP distance to contribute a little score weight on distant hosts and subtract a little for nearby hosts. Mark
RE: greetpause was Re: xxxl spam
On Tuesday, April 11, 2006 1:37 PM -0700 [EMAIL PROTECTED] wrote: Agreed. Spammers have access to all the free CPU bandwidth and processing time they can steal - legitimate MTAs are limited to a budget. Any anti-spam solution that simply rewards CPU and bandwidth spent* is playing into the hands of the spammers. The original concern was that spammers would use larger messages to avoid the size cutoff in SA, but this was countered because spammers have to reduce their message rate to send larger messages. Server-side, GreetPause (and greylisting) forces a client to reduce its message rate. If the client has unlimited bandwidth and doesn't care about the reduced message rate, it might as well shovel giant messages. In for a penny, in for a pound.