Re: Spam and the Internet [Was: xxxl spam]

2006-04-17 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Matt Kettler wrote:
...snip...

 Here's one, if you want to see it:
 
 http://mywebpages.comcast.net/mkettler/spam.jpg
 
 
 There's pretty close to zero chance that anyone in the US is going to hop on a
 plane and fly to Guatemala to buy ordinary lawn care products from a small
 store. But that's the kind of ads I'm getting.

but they've got heart-shaped pancake molds... you wouldn't fly to
guatamala for that?  and at Q.29?! what a bargain!


(heh, i couldn't resist)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEQ0keE2gsBSKjZHQRAjkKAJ9AnC7vS409cSYvoyczXPpK9NNa9QCgtZsb
68xY13eQIvXXLSrkT996/hM=
=rejD
-END PGP SIGNATURE-


Re: Non-English languages (was: xxxl spam)

2006-04-14 Thread John Rudd


On Apr 13, 2006, at 9:46 PM, Kenneth Porter wrote:

On Thursday, April 13, 2006 10:32 PM -0600 Paul R. Ganci 
[EMAIL PROTECTED] wrote:


Unfortunately I am still a linguistic idiot and only speak English 
... a

Buffalo, NY version at that! My grand parents came over from Italy in
1920 and promptly stopped speaking Italian around my parents. It 
forced
my parents to learn English at the cost of never learning Italian. 
There

is plently of room to accomodate two languages but neither the US
education system or home life is set up to do it.


Same here. I took a couple years of high school Spanish in California 
and the classes dragged so incredibly slowly that I learned just a 
little vocabulary and the most basic of grammar, and still led the 
class. I usually finished my physics homework in that class while 
waiting for everyone to catch up.


As a programmer I envy my professional peers who can speak Japanese 
and other non-European languages. My interest in programming languages 
extends to natural languages, and I find their differences 
fascinating.


To those of you who've successfully learned 2nd and 3rd languages as 
an adult, what do you recommend for accomplishing that?




I wish I had stuck with German in HS.  And I wish I had taken the time 
to learn Latin and/or Greek back when I had all of that free time on my 
hands in HS.  These days, it seems like everyone* ought to know (in 
addition to English) Spanish, and then a choice of French, Chinese, or 
Japanese.


(* in the US, I don't mean globally; globally, I'd probably say that we 
should all know 3 out of those 5, but that's just me making 
wild-a*s-suggestions for a world that doesn't care about my opinion ;-) 
)


And, reiterating Kenneth's question: Anyone have advice for an almost 
middle-aged person who wants to go about expanding his natural language 
capabilities?


(Hmm.. that's probably a dumb question for me.. I think all of those 
are taught at the university where I work... and can take free classes; 
could add Italian, Latin, and Greek too...; still for everyone who 
doesn't work for a University, but who has a similar thought, it's a 
good question to ponder)




Re: xxxl spam

2006-04-14 Thread Philip Prindeville
mouss wrote:

  and I've got plenty of users that speak
  

multiple languages, not all of which use plain-ascii.




I guess so. now I'm not sure our situation isn't worst because people 
tried to find non standard solutions that are still used. I still 
remember the days when some customers were asking us to fix our 
software because it broke their accents... hopefully these times are 
gone, but I still see broken mail (much more than I should). actually, 
I also see mail that doesn't get rendered correctly on thunderbird. so 
I'll admit that the issue isn't really about accented chars...
  


This is a real sore point for me.  I worked on the Mime quoted-printable
encoding
14 years ago, and in some ways we haven't come nearly as far as we
should have
(see my posts as [EMAIL PROTECTED] when I was at France Telecom).

A lot of it has to do with idiots like Microsoft pushing competing
standards (like
Windows-1251) that offer no advantage whatsoever over their established
standards (like ISO Latin-1) and serve only to increase the exponential
problem
of interoperability matrices... the number of ways each agent must be tested
against other agents, etc...  thereby guaranteeing that complete testing
of all
possible permutations becomes an unattainable goal receding ever more
quickly
towards the horizon

Where we could have been smart and limited ourselves to a manageable and
very finite set of permutations instead...

This is why our site has the following rule:

# don't allow windows-125x text attachments...
mimeheader __CTYPE_MH_WIN1252   Content-Type =~
/charset=\windows-125[0-8]\/i
meta L_WIN_CHARSET  ((__CTYPE_MH_HTML ||
__CTYPE_MH_TEXT_PLAIN)  __CTYPE_MH_WIN1252)
describe L_WIN_CHARSET  Content-Type is Windows-specific text
score L_WIN_CHARSET 0.1


should probably do the same for non-MIME content, but it's not as much of a
problem since Outlook prefers MIME content.

If anyone wants to talk to us, they can stick with ISO Latin-1.  We
don't need no stinkin'
Windows-125x...  (or -839 for that matter).

-Philip



Re: Non-English languages (was: xxxl spam)

2006-04-14 Thread Roger Taranto
On Thu, 2006-04-13 at 23:38, John Rudd wrote:
 And, reiterating Kenneth's question: Anyone have advice for an almost 
 middle-aged person who wants to go about expanding his natural
language 
 capabilities?

There was an article in Newsweek a few weeks back about language
immersion vacations.  Here's the related msnbc story:
http://msnbc.msn.com/id/11481528/

-Roger


Re: xxxl spam

2006-04-14 Thread Michael Monnerie
On Freitag, 14. April 2006 06:32 Paul R. Ganci wrote:
 Start young when it is easy for kids to pick up the sounds.

Yes, my daughter has the advantage of learning german with me, french 
with my wife, and later at school she will learn english anyway.

Still, people in Belgium have it more easy: in addition to en,de,fr, 
they learn dutch and their local flavor, a mix of all languages (which 
dutch is already anyways).

The most funny party concerning languages I had was on Crete (and island 
of Greece): It was a party where all the tourist guides were, about 20 
people and at least 9 different languages, where each could speak at 
least 2, often 4... now that's a mess :-)

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpfa5aGhC2IS.pgp
Description: PGP signature


Re: Non-English languages (was: xxxl spam)

2006-04-14 Thread Manuel Giorgini
[2006-04-14 08:38:46] John Rudd,

I wish to start by greeting the list; I am a recent addition and I have been
lurking for the past two weeks. You guys already make enough traffic. :-)

JR And, reiterating Kenneth's question: Anyone have advice for an almost
JR middle-aged person who wants to go about expanding his natural language 
JR capabilities?

I am an Esperanto speaker.

There are many reasons to give it a try. These are pretty much universally
accepted:

For one, it's quite simple to learn, for those who already know an
indo-european language; after a couple months you'll be able to sustain a
decent conversation.

It also helps recognizing and understanding other languages. There have been
experiments on this.

There are also social and personal reasons. I won't enter into this, though.
If you are really interested you'll find them out by yourself. I will only say
that I found the language really intriguing, very expressing, and fun.

A few pointers,

http://www.esperanto.se/dok/praguemanifesto.html
http://www.lernu.net


Cordialità / Best regards / Gxis la

Manuel Giorgini [EMAIL PROTECTED], Programmatore
INTERLOGICA e-business solutions -  http://www.interlogica.net
Via Fusinato, 27 - IT 30171 Mestre VE - Italia - Unione Europea
Tel +39 041 099 30 00 (6 linee r.a.) - Fax +39 041 504 11 72




Re: Non-English languages (was: xxxl spam)

2006-04-14 Thread Michael Monnerie
On Freitag, 14. April 2006 06:46 Kenneth Porter wrote:
 To those of you who've successfully learned 2nd and 3rd languages as
 an adult, what do you recommend for accomplishing that?

There are books called Assimil, because you just assimilate the 
language with them, learning in a very natural way by speaking full 
sentences from the beginning. It looks very complicated first, but is 
really quite easy then. I've managed to learn greek in a very short 
time with it, and now I'm struggling with french (which is quite hard 
though).

http://www.assimil.com/

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpKoEMvVF1dI.pgp
Description: PGP signature


Re: Non-English languages (was: xxxl spam)

2006-04-14 Thread Manuel Giorgini
[2006-04-14 06:46:51] Kenneth Porter,

KP To those of you who've successfully learned 2nd and 3rd languages as an
KP adult, what do you recommend for accomplishing that?

As soon as you finish the basic/intermediate courses, find a penpal, or more
than one, as soon as you can. With the Internet it's quite easy. A friend of
mine picks out foreign people willing to learn Italian and they help each
other this way. There are websites set up for this, if I'm not mistaken.
Practising really helps.


Cordialità / Best regards / Gxis la

Manuel Giorgini [EMAIL PROTECTED], Programmatore
INTERLOGICA e-business solutions -  http://www.interlogica.net
Via Fusinato, 27 - IT 30171 Mestre VE - Italia - Unione Europea
Tel +39 041 099 30 00 (6 linee r.a.) - Fax +39 041 504 11 72




Re: xxxl spam

2006-04-14 Thread John Rudd


On Apr 14, 2006, at 12:40 AM, Michael Monnerie wrote:


On Freitag, 14. April 2006 06:32 Paul R. Ganci wrote:

Start young when it is easy for kids to pick up the sounds.


Yes, my daughter has the advantage of learning german with me, french
with my wife, and later at school she will learn english anyway.

Still, people in Belgium have it more easy: in addition to en,de,fr,
they learn dutch and their local flavor, a mix of all languages (which
dutch is already anyways).

The most funny party concerning languages I had was on Crete (and 
island

of Greece): It was a party where all the tourist guides were, about 20
people and at least 9 different languages, where each could speak at
least 2, often 4... now that's a mess :-)



My favorite story isn't that extreme.  It's about a friend of mine who 
went and did his senior year of HS in study abroad.  He had learned 
German in HS, but was sent to Denmark (close) and spent that year 
learning the language.


When he came back, there was this big party thing in Washington DC for 
all of the exchange students going in both directions.  He came back to 
the US not having spoken any English for a year, and was put in a hotel 
room with someone who had been in Germany not speaking English for a 
year, and a German who had been speaking only English for a year.  So, 
none of them was entirely comfortable going back to speaking their 
native language yet, none of them had been speaking the same language 
as the other two during that year ... and they stayed up all night 
talking.  At first, each just spoke the language they had been speaking 
for the year, and the other two just understood.


I think Daniel said that by morning, he was speaking English again :-}



Re: Non-English languages (was: xxxl spam)

2006-04-14 Thread Ham
On Thu, 2006-04-13 at 23:38, John Rudd wrote:
 And, reiterating Kenneth's question: Anyone have advice for an almost 
 middle-aged person who wants to go about expanding his natural language 
 capabilities?

There was an article in Newsweek a few weeks back about language
immersion vacations.  Here's the related msnbc sto:
http://msnbc.msn.com/id/11481528/

-Roger


Re: xxxl spam

2006-04-13 Thread hamann . w

Hi,

to read this in other words: while certain analysts (and definitlely microsoft 
marketing)
claim that about 50 % of all servers is running windows, these figures tend to 
say that
real mail servers (those that deliver the ham part of mail) rarely ever run XP
but that this OS is the best candidate for creating a spam zombie

Wolfgang Hamann


   p0f OS guessham :   spam
-
Windows-XP0.7 % : 99.3 %
Windows-2000  5.8 % : 94.2 %
UNKNOWN  16.5 % : 83.5 %
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %
(Unix+Linux  66.5 % : 33.5 %)

Only 0.7% of all mail coming from Windows-XP hosts is ham!!!
It is an ideal information to contribute two or three score points.







Re: xxxl spam

2006-04-13 Thread Daryl C. W. O'Shea

Mark Martinec wrote:


The most interesting part in my view is not the IP distance, but the
type of OS, illustrated by the following table (derived from the same
data as fig2):

p0f OS guessham :   spam
-
Windows-XP0.7 % : 99.3 %
Windows-2000  5.8 % : 94.2 %
UNKNOWN  16.5 % : 83.5 %
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %
(Unix+Linux  66.5 % : 33.5 %)

Only 0.7% of all mail coming from Windows-XP hosts is ham!!!
It is an ideal information to contribute two or three score points.


I'm not sure the ham hit rate from the Windows-XP category scales (to 
other installations) very well.  The last time I looked into using p0f 
to fingerprint connecting hosts, last spring, I seem to recall that 
Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint 
identically.


While it'd be nice to be score Windows-XP hosts harshly, there's a lot 
of mail coming from Windows Server 2003 hosts that would get hit.


I know for some of my systems 1:99 would be really low if Windows Server 
2003 and XP are identified the same.  40:60 (and in some cases 80:20) 
would be closer to what I often see if I were to assume that all spam 
came from Windows XP hosts.


Maybe you don't receive much, if any, mail from Windows Server 2003 hosts?


Daryl


Re: xxxl spam

2006-04-13 Thread Loren Wilton
 to read this in other words: while certain analysts (and definitlely
microsoft marketing)
 claim that about 50 % of all servers is running windows, these figures
tend to say that
 real mail servers (those that deliver the ham part of mail) rarely ever
run XP
 but that this OS is the best candidate for creating a spam zombie

Not completely unreasonable.  XP is targeted within MS as a personal or very
small company OS.  The equivalent of a linux/unix system used by more than a
single person would typically be some version of Server 2003.  Which was
probably identified in the stats as Windows 2000.

I'd like to venture the suggestion that the percentage of spam from XP isn't
necessarily an indication of inherent buggyness.  It is more an indication
that it is an OS for Clueless Noobs who haven't a clue about maintaining a
system, avoiding a virus, or even able to tell if they have a viruis.  Thes
are the machines that turn into zombies.

If there were as many linux machines in the hands of Clueless Noobs, I'd bet
that the number of infected linux systems would be in the similar percentage
range.  Remember, these XP systems are virtually all run with Administrator
(aka root) privs all the time, by people that haven't a clue what that
means.  What would happen if all linux-like systems ran that way?)

Loren



Re: xxxl spam

2006-04-13 Thread Mark Martinec
Wolfgang, Loren,
  real mail servers (those that deliver the ham part of mail) rarely ever
  run XP but that this OS is the best candidate for creating a spam zombie

 Not completely unreasonable.  XP is targeted within MS as a personal or
 very small company OS.  The equivalent of a linux/unix system used by more
 than a single person would typically be some version of Server 2003.  Which
 was probably identified in the stats as Windows 2000.

 I'd like to venture the suggestion that the percentage of spam from XP
 isn't necessarily an indication of inherent buggyness.  It is more an
 indication that it is an OS for Clueless Noobs who haven't a clue about
 maintaining a system, avoiding a virus, or even able to tell if they have a
 viruis.  Thes are the machines that turn into zombies.

I fully agree.

In this view the following two lines should be seen as well:

p0f OS guessham :   spam
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %

Linux is used by masses (compared to other Unix OS types) because it is
considered to be easier to set up. Eventually this also means that less care
is invested in prevention of being used to propagate spam.

Still, a score  L_P0F_Unix  -1.0 seems to be doing a good job here.


Daryl,
 I'm not sure the ham hit rate from the Windows-XP category scales (to
 other installations) very well.  The last time I looked into using p0f
 to fingerprint connecting hosts, last spring, I seem to recall that
 Windows XP and Windows 2003 share the same TCP/IP stack and fingerprint
 identically.

 While it'd be nice to be score Windows-XP hosts harshly, there's a lot
 of mail coming from Windows Server 2003 hosts that would get hit.

There is indeed a handful of valid small sites classified by p0f as Windows XP 
from which we do receive regular mail (well, newsletters and such, but still,
should be treated mostly as ham). I don't see adding few score points to them
much different than other (some quite arbitrary) rules - each rule tries to
have low FP rate, but it often is not zero. Only a collection of all rules has
merit.

 I know for some of my systems 1:99 would be really low if Windows Server
 2003 and XP are identified the same.  40:60 (and in some cases 80:20)
 would be closer to what I often see if I were to assume that all spam
 came from Windows XP hosts.
 Maybe you don't receive much, if any, mail from Windows Server 2003 hosts?

I guess Windows Server 2003 is reported as Windows 2000, but I don't know.
Certainly a couple of very large sites are seen as Windows 2000.

In the UNKNOWN category there must be a mix of Windows and Unix hosts,
not sure what is unusual about them.

  Mark


Re: xxxl spam

2006-04-13 Thread Daryl C. W. O'Shea

Mark Martinec wrote:


I guess Windows Server 2003 is reported as Windows 2000, but I don't know.
Certainly a couple of very large sites are seen as Windows 2000.

In the UNKNOWN category there must be a mix of Windows and Unix hosts,
not sure what is unusual about them.

  Mark


Hmm... FWIW:

[EMAIL PROTECTED] dos]$ sudo p0f -i eth1
p0f - passive os fingerprinting utility, version 2.0.4
(C) M. Zalewski [EMAIL PROTECTED], W. Stearns [EMAIL PROTECTED]
p0f: listening (SYN) on 'eth1', 223 sigs (12 generic), rule: 'all'.
24.141.168.241:4218 - Windows XP Pro SP1, 2000 SP3
  - 66.98.221.156:25 (distance 1, link: ethernet/modem)
66.98.221.156:2602 - Windows 2000 SP4, XP SP1
  - 24.141.168.241:783 (distance 19, link: ethernet/modem)


24.141.168.241 is Windows XP Pro SP1
66.98.221.156 is Windows Server 2003 SP1 (Standard Edition)


Daryl



Re: xxxl spam

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 12:12 AM, Loren Wilton wrote:

I'd like to venture the suggestion that the percentage of spam from XP 
isn't
necessarily an indication of inherent buggyness.  It is more an 
indication
that it is an OS for Clueless Noobs who haven't a clue about 
maintaining a
system, avoiding a virus, or even able to tell if they have a viruis.  
Thes

are the machines that turn into zombies.



While I don't disagree with your assessment of XP systems, I have a 
different hunch about why such a large percentage of the mail coming 
from XP systems is spam, and a smaller percentage of mail coming from 
the other systems is spam:


a) In general, XP systems are not servers, and therefore, are not mail 
servers.


b) Due to (a), if you do your mail/spam/virus scanning on machines that 
do not receive direct connections from your own clients 
(mail/spam/virus scanning at the border), OR if you do not have a high 
percentage of XP clients in your domain, then your scanning systems 
will not receive many (if any) legitimate direct connections from XP 
clients ... because a legitimate mail sending process on an XP system 
will be directly connecting to their own domain's mail server, and not 
to YOUR mail scanning systems.


c) Thus, if you meed the conditions in (b), and if we accept (a) as 
true, then the vast majority of connections you receive from XP 
systems, on your mail scanning systems, will be from spam/virus bots 
trying to directly submit spam or virus laden messages to your mail 
gateways instead of submitting it to their own mail servers (as bots 
are known to do).



We would expect to see a lower percentage of spam from server type OSes 
(or OSes that can be clients or servers) because a higher percentage of 
those platforms are used as legitimate mail servers.


The other factor here is: while I _hate_ linux, how much of the spam 
being submitted by linux boxes is merely a mail server relaying on 
behalf of one of their infected clients? (same with the unix systems, 
and the 2000/2003 systems)  And thus not at all indicative of the 
quality of linux systems administration out on the internet.



I think this is one of those cases where the statistics work as blind 
observations of behavior, but attempting to describe _why_ the 
statistics works is not something you can sum up with a simple an 
straight forward explanation.  Kinda like QM.





Re: xxxl spam

2006-04-13 Thread mouss

John Rudd wrote:
While I don't disagree with your assessment of XP systems, I have a 
different hunch about why such a large percentage of the mail coming 
from XP systems is spam, and a smaller percentage of mail coming from 
the other systems is spam:


a) In general, XP systems are not servers, and therefore, are not mail 
servers.


b) Due to (a), if you do your mail/spam/virus scanning on machines that 
do not receive direct connections from your own clients (mail/spam/virus 
scanning at the border), OR if you do not have a high percentage of XP 
clients in your domain, then your scanning systems will not receive many 
(if any) legitimate direct connections from XP clients ... because a 
legitimate mail sending process on an XP system will be directly 
connecting to their own domain's mail server, and not to YOUR mail 
scanning systems.


c) Thus, if you meed the conditions in (b), and if we accept (a) as 
true, then the vast majority of connections you receive from XP systems, 
on your mail scanning systems, will be from spam/virus bots trying to 
directly submit spam or virus laden messages to your mail gateways 
instead of submitting it to their own mail servers (as bots are known to 
do).



We would expect to see a lower percentage of spam from server type OSes 
(or OSes that can be clients or servers) because a higher percentage of 
those platforms are used as legitimate mail servers.


The other factor here is: while I _hate_ linux, how much of the spam 
being submitted by linux boxes is merely a mail server relaying on 
behalf of one of their infected clients? (same with the unix systems, 
and the 2000/2003 systems)  And thus not at all indicative of the 
quality of linux systems administration out on the internet.



I think this is one of those cases where the statistics work as blind 
observations of behavior, but attempting to describe _why_ the 
statistics works is not something you can sum up with a simple an 
straight forward explanation.  Kinda like QM.




ot
I agree that statistics aren't the whole story. you can study the 
percentage of thiefs/criminals based on skin color and origin (some 
people already do it, and many jump to conclusions without studies). but 
you can do the same study based on social situation and past history of 
people. the first researcher will probably conclude that 
black/arabic/latin/... people are more criminal. the second 
researcher will instead conclude that criminality is more seen in poor 
communities, but that these aren't the worst criminals (killing vs 
stealing for instance).

/ot

back to xp and co. my feeling (no, I didn't run a study and won't) is 
that even if any study would show that we get more spam from XP than 
from linux, I will not use this to classify my mail.


I am certain that if you do stats on mail date, you'll find that some 
dates correspond to more spam than others. we've already seen people 
jumping to block specific mailers (the bat for instance) based on their 
stats. I am also seing many legit mail trigering some SA rules (*_exess, 
no_real_name, x_library, ...). when I see this, I check the rule, and if 
I can't find a justification, I disable it.




Re: xxxl spam

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 9:56 AM, mouss wrote:



I am also seing many legit mail trigering some SA rules (*_exess, 
no_real_name, x_library, ...). when I see this, I check the rule, and 
if I can't find a justification, I disable it.




I wouldn't do that.

Just because legitimate mail triggers some rule doesn't mean that the 
rule is flawed.  Using your example, triggering no_real_name does not 
mean that the message is spam, it means that the message has _some_ 
similarity to at least some spam messages (the higher the score, the 
stronger the similarity).  And, that's absolutely true: statistically, 
when looking at the corpus which was used to create the rules database, 
a higher percentage of no_real_name messages were spam.


Now, if legit messages were not just triggering those rules, but also 
triggering enough rules to be flagged as spam ... then I would lower 
the value of those rules, but not disable those rules.  But I would 
only do that if I could see that there was a large percentage of 
should-be-ham messages being flagged as spam by that rule AND that rule 
wasn't being useful in flagging spam messages.  The reason is: if the 
message is being flagged, but it shouldn't have been, then perhaps my 
corpus of messages differs significantly enough from the SA internal 
corpus that my score values need to be different.  But that doesn't 
mean that the rules are so disjoint from tracking spam that they should 
be entirely disabled.  They just don't have the same weighting that my 
corpus needs.


If, instead, most messages passing through my mail servers, that 
triggered that rule, really did seem to be spam, then I wouldn't alter 
the score at all.  I would just pass the should-have-been-ham message 
into my bayesian learner and hope that a low bayes score for messages 
like that would offset the rules had flagged it as spam.




Re: xxxl spam

2006-04-13 Thread Matt Kettler
mouss wrote:
 I also understand that US guys may get less encoded subjects, but at least in 
 .fr, we have that all the time (because of our accented letters, and because 
 many companies still use software that predates mime). and if I find a 
 legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. 

Sounds like we need more non-us based corpus contributors. After all, the SA
devs can only work with what they get.

Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the
US. Last I checked he was in Ireland. Unfortunately this doesn't help with the
encoding issue, as they still use ordinary English characters over there for
most things. (I don't think Gaelic is very common in email.)

So bear in mind that SA isn't just developed in the US by US citizens for US
markets.

However, it is true that the vast majority of the corpus currently comes from
folks who speak English (King's or Yankee) as a primary language, and that's a
bit of a problem as it creates considerable bias in the rules.

And even us US folks do have encoding issues. After all, English is not our
official language here in the US, and I've got plenty of users that speak
multiple languages, not all of which use plain-ascii.




Re: xxxl spam

2006-04-13 Thread mouss

Matt Kettler wrote:

mouss wrote:
I also understand that US guys may get less encoded subjects, but at least in .fr, we have that all the time (because of our accented letters, and because many companies still use software that predates mime). and if I find a legitimate IP in a dnsbl used by SA, then I just remove that dnsbl. 


Sounds like we need more non-us based corpus contributors. After all, the SA
devs can only work with what they get.

Also, bear in mind that SpamAssassin's creator, Justin Mason, isn't based in the
US. Last I checked he was in Ireland. Unfortunately this doesn't help with the
encoding issue, as they still use ordinary English characters over there for
most things. (I don't think Gaelic is very common in email.)

So bear in mind that SA isn't just developed in the US by US citizens for US
markets.


oh, I never meant that.



However, it is true that the vast majority of the corpus currently comes from
folks who speak English (King's or Yankee) as a primary language, and that's a
bit of a problem as it creates considerable bias in the rules.

And even us US folks do have encoding issues. After all, English is not our
official language here in the US,


what do you mean here? what would be your official language?

 and I've got plenty of users that speak

multiple languages, not all of which use plain-ascii.



I guess so. now I'm not sure our situation isn't worst because people 
tried to find non standard solutions that are still used. I still 
remember the days when some customers were asking us to fix our 
software because it broke their accents... hopefully these times are 
gone, but I still see broken mail (much more than I should). actually, 
I also see mail that doesn't get rendered correctly on thunderbird. so 
I'll admit that the issue isn't really about accented chars...




Re: xxxl spam

2006-04-13 Thread Matt Kettler
mouss wrote:

 However, it is true that the vast majority of the corpus currently
 comes from
 folks who speak English (King's or Yankee) as a primary language, and
 that's a
 bit of a problem as it creates considerable bias in the rules.

 And even us US folks do have encoding issues. After all, English is
 not our
 official language here in the US,
 
 what do you mean here? what would be your official language?

The United States of America does not have any official language.

Americanized English is our common language, but it's not official. This means
that our government has to supply forms and materials in many languages for its
citizens, because it cannot require that citizens speak English.

For example, we have tax forms in French:

http://www.irs.gov/pub/irs-access/f2290fr_accessible.pdf

Admittedly non-english forms and services are somewhat secondary here, but they
are present.

 
  and I've got plenty of users that speak
 multiple languages, not all of which use plain-ascii.

 
 I guess so. now I'm not sure our situation isn't worst because people
 tried to find non standard solutions that are still used. I still
 remember the days when some customers were asking us to fix our
 software because it broke their accents... hopefully these times are
 gone, but I still see broken mail (much more than I should). actually,
 I also see mail that doesn't get rendered correctly on thunderbird. so
 I'll admit that the issue isn't really about accented chars...
 

Well, yours is certainly worse, or at least more prevalent, than the problem
here in the US, but I would not say it's the worst.

Generally speaking the worst case seems to be present in smaller Asian nations,
which have really extensive use of non-us characters. At least the French can
restrict their text to the same character set as English and still be readable,
although awkward due to the screwed up accents.

Also, smaller Asian nations still to this day have a high prevalence of
locally-grown mail clients, many of which are not even remotely RFC compliant,
but work well with others in the same locale.

They're also much more likely to make use of mixed-language text containing many
character sets. Speaking 2 or 3 different languages is fairly common in the
smaller countries of the Asian region, just due to necessity for trade with
neighboring countries.

Another area with this same basic issue would be the middle-east, but the number
of completely different character sets is smaller.






Re: xxxl spam

2006-04-13 Thread John Rudd


On Apr 13, 2006, at 11:40 AM, mouss wrote:


Matt Kettler wrote:


And even us US folks do have encoding issues. After all, English is 
not our

official language here in the US,


what do you mean here? what would be your official language?



The US doesn't have an official language.

By default, it is assumed to be English for most things, but it's not 
Official.  And, in some regions within the US, official govt signs 
and documents come in various languages (the reasons why this is true 
has to do with liability and legality; since there's no official 
language, you can't just pick _one_ language to publish your forms in, 
and be done with it; if you do, you're neglecting significant minority 
populations (and in some regions, those can be quite significant, such 
as spanish speakers in southern Florida or southern California), which 
then makes you vulnerable to law suits saying that you're 
discriminating and/or being negligent toward those significant 
minorities who aren't required to speak English, because English isn't 
an official language).


In order to simplify this, some states have tried to enact official 
language legislation.  Florida tried it.  Someone put Make English the 
official state language on a ballot.  The Cuban-American population in 
southern Florida got mad, and put Make Spanish the official state 
language on the ballot.  Neither one passed, but the Spanish one got 
more votes.  This pretty much silenced the English as state language 
movement in Florida, as their plan almost backfired on them.


I don't remember any other state trying it since.  The states where 
there wouldn't be any opposition don't need to make it a law ... and in 
states like California where it could matter (reducing costs in govt 
overhead by eliminating multiple languages and the requirement for 
multilingual workers), the English as state language supporters are 
afraid of what almost happened in Florida.


So ... sorry for the long winded explanation, but that's what he was 
saying.




Re: xxxl spam

2006-04-13 Thread Loren Wilton
 states like California where it could matter (reducing costs in govt
 overhead by eliminating multiple languages and the requirement for
 multilingual workers), the English as state language supporters are
 afraid of what almost happened in Florida.

Considering that at last census a minority of 54% of California residents
spoke Spanish as their primary or only language...


I predict that the US will be the first country in the 21th century to
abandon English as the national language, while almost all other countries
seem to be mandating that their citizens learn English.

Loren



Re: xxxl spam

2006-04-13 Thread Paul R. Ganci

Loren Wilton wrote:


I predict that the US will be the first country in the 21th century to
abandon English as the national language, while almost all other countries
seem to be mandating that their citizens learn English.

   Loren
 

The problem with the US is that we are linguistic idiots (a quote from 
Columbia University German Professor). If you go to Europe in general 
they speak at least two languages fluently. English and the country's 
native language. I have had the opportunity to work in both Geneva, 
Switzerland and and Milan, Italy. All business is conducted in English 
and everything else in Italian or in the case of Switzerland either 
German, Swiss German or French. Essentially all the engineers with whom 
I worked could speak two languages or in some cases four. I don't know 
what the big deal is. It shouldn't be one language but at least two 
here in the US. Start young when it is easy for kids to pick up the sounds.


Unfortunately I am still a linguistic idiot and only speak English ... a 
Buffalo, NY version at that! My grand parents came over from Italy in 
1920 and promptly stopped speaking Italian around my parents. It forced 
my parents to learn English at the cost of never learning Italian. There 
is plently of room to accomodate two languages but neither the US 
education system or home life is set up to do it.


--
Paul ([EMAIL PROTECTED])



Non-English languages (was: xxxl spam)

2006-04-13 Thread Kenneth Porter
On Thursday, April 13, 2006 10:32 PM -0600 Paul R. Ganci 
[EMAIL PROTECTED] wrote:



Unfortunately I am still a linguistic idiot and only speak English ... a
Buffalo, NY version at that! My grand parents came over from Italy in
1920 and promptly stopped speaking Italian around my parents. It forced
my parents to learn English at the cost of never learning Italian. There
is plently of room to accomodate two languages but neither the US
education system or home life is set up to do it.


Same here. I took a couple years of high school Spanish in California and 
the classes dragged so incredibly slowly that I learned just a little 
vocabulary and the most basic of grammar, and still led the class. I 
usually finished my physics homework in that class while waiting for 
everyone to catch up.


As a programmer I envy my professional peers who can speak Japanese and 
other non-European languages. My interest in programming languages extends 
to natural languages, and I find their differences fascinating.


To those of you who've successfully learned 2nd and 3rd languages as an 
adult, what do you recommend for accomplishing that?


Re: xxxl spam

2006-04-12 Thread Justin Mason

Theo Van Dinter writes:
 On Tue, Apr 11, 2006 at 02:14:26PM -0400, Matt Kettler wrote:
  Well, SA automatically ignores attachments in recent versions. However,
  hash-based plugins like razor, dcc, and pyzor work best when seeing all the
  attachments.
 
 For completeness, the first sentence isn't exactly true.
 SA automatically ignores attachments for the standard set of body,
 header, and uri rules, but it still has to read in the data, store it in
 the message tree internally, and make the entire message text available
 for full rules.
 
 There are also things like the AntiVirus plugin, etc, which may go ahead
 and decode attachments and do things with the data.  I could easily see
 a plugin for ClamAV, or something scanning image files, etc.
 
 I think that at some point, the default size could go up, but I wouldn't
 try it for now.

Matt Sergeant had a good trick in the qpsmtpd SpamAssassin plugin iirc --
it would download the entire message, but after a certain point (e.g.
250k) it would stop writing the incoming data to memory, and instead flush
the remainder to a temporary file on disk.

That way it could keep only the first 250k of messages, scanning that
part, and once complete, reassemble the whole message as it wrote it back
out.

However there may be issues there -- e.g. consider a multipart/alternative
message containing an innocent-looking 600k text/plain, followed by a 10k
text/html spam payload.  Common MUAs would display the latter,
SpamAssassin would scan the former.  That seems to be a vulnerability
to me, although we already don't scan large messages _anyway_ ;)

Also as Theo said, it fails in the face of any kind of message-body
rewriting by SpamAssassin.

--j.


Re: xxxl spam

2006-04-12 Thread Justin Mason

That's excellent data!  Mind if I forward that around to another
list or two?

The hops measurement is particularly interesting.  Have you got that
implemented as a working rule, in the field?  is it expensive?

--j.

Mark Martinec writes:
 mouss wrote:
  since most filters skip large messages, it may be tempting for spammers
  to send large messagess:
 
 I did some statistical analysis few weeks ago with SA 3.1.1
 (SA called from amavisd-new, but that is beside the point).
 
 Please see:
 
   http://www.ijs.si/software/amavisd/fig4.gif
 Shows spam score vs. mail size as a scattergram
 
   http://www.ijs.si/software/amavisd/fig5.gif
 Shows elapsed time for mail checking vs. mail size
 (shown is total time, but 90% of it reflects processing
 within SA and its plugins)
 
 As a curiosity (but off topic), harvesting results from p0f
 (passive operating system fingerprinting), here are two more:
 
   http://www.ijs.si/software/amavisd/fig1.gif
 Spam score vs. IP distance in hops (our server is
 in European academic network Geant)
 
   And perhaps most interesting of all (by again OT):
 
   http://www.ijs.si/software/amavisd/fig2.gif
 Spam score distribution as a percentage of all mail,
 separate by each sending mail client's operating system.
 
 Mark


Re: Spam and the Internet [Was: xxxl spam]

2006-04-12 Thread Justin Mason

Matt Kettler writes:
 These spams I get from .gt don't offer any kind of online ordering. They
 are ads that you'd have to physically travel to the store in Guatemala
 to take advantage of them. They're ordinary weekly sales fliers for an
 ordinary local store that's so small that only 6 cars can park in front
 of it. (They have pictures of the store in some of them). Delivered to
 my mailbox as 1/2 meg .jpg files. It's really quite bizarre, and
 amusing.
 
 Here's one, if you want to see it:
 
 http://mywebpages.comcast.net/mkettler/spam.jpg

wow.
3 of the cars are photoshopped in, btw. ;)

--j.


Re: xxxl spam

2006-04-12 Thread Mark Martinec
Justin,

 Mark Martinec writes:
  As a curiosity (but off topic), harvesting results from p0f
  (passive operating system fingerprinting), here are two more:
http://www.ijs.si/software/amavisd/fig1.gif
  Spam score vs. IP distance in hops (our server is
  in European academic network Geant)
And perhaps most interesting of all (by again OT):
http://www.ijs.si/software/amavisd/fig2.gif
  Spam score distribution as a percentage of all mail,
  separate by each sending mail client's operating system.

 That's excellent data!  Mind if I forward that around to another
 list or two?

I don't mind.

 The hops measurement is particularly interesting.  Have you got that
 implemented as a working rule, in the field?  is it expensive?

Yes, implemented in the field - comes with the latest amavisd-new-2.4.0.
It inserts one header field with collected information into mail header,
making it available to SA to score it as it wishes (custom rules, bayes).
It could probably just as well be implemented as a SA plugin (making use
of the supplied lightweight p0f-analyzer.pl interface to p0f), but it was
easier for me to do it in amavisd-new, where remote SMTP client's IP address
is accessible directly, not needing to parse header and understand topology.

It is reasonably inexpensive: cost of running p0f utility is comparable to
running tcpdump, it takes about one hour CPU per month on our medium-busy
mailer, the rest is negligible, no additional latencies and no additional
network traffic.

The most interesting part in my view is not the IP distance, but the
type of OS, illustrated by the following table (derived from the same
data as fig2):

p0f OS guessham :   spam
-
Windows-XP0.7 % : 99.3 %
Windows-2000  5.8 % : 94.2 %
UNKNOWN  16.5 % : 83.5 %
Linux58.8 % : 41.2 %
Unix 80.3 % : 19.7 %
(Unix+Linux  66.5 % : 33.5 %)

Only 0.7% of all mail coming from Windows-XP hosts is ham!!!
It is an ideal information to contribute two or three score points.

Traffic from own PC clients must not be seen by p0f, otherwise one would
be penalizing site's own user. This can be achieved by either separating
MSA from MTA, or using list of internal IP networks for exclusion.


A quick summary from amavisd-new-2.4.0 release notes:

- experimental support for passive operating system fingerprinting with
  the use of externally running utility p0f, supplying collected information
  as a header field to SpamAssassin, making possible to add rules to score
  SMTP client hosts based on educated guess about their operating system
  type and IP distance; see below for details;

Here are the installation details:

- passive operating-system fingerprinting (p0f) support lets SA gain
  information about SMTP client's operating system and estimated IP distance,
  and can reduce the number of bounces:

  * find and install the p0f utility: http://lcamtuf.coredump.cx/p0f.shtml
or in FreeBSD ports collection as 'net-mgmt/p0f';

  * start a p0f process on the same host where MTA (MX) is running, making
it listen only to incoming TCP sessions (to reduce its workload) to the
IP address and TCP port (25) where MTA is accepting incoming mail from
outside (it doesn't hurt to let it see other traffic too, it just isn't
needed); after testing p0f alone and seeing that it works, you may start
it up, feeding its output to program p0f-analyzer.pl that comes with
amavisd-new package, e.g.:

  p0f -l 'tcp dst port 25' 21 | p0f-analyzer.pl 2345 

on multi-homed boxes one may need to specify interface and IP address
where MTA is listening, the filter syntax is the same as in tcpdump, e.g.:

  p0f -l -i bge0 'dst host 192.0.2.66 and tcp dst port 25' 21 \
| p0f-analyzer.pl 2345 

  * the program p0f-analyzer.pl reads p0f reports on stdin, keeps a cache
for a limited time (10 minutes, configurable) of data about incoming TCP
sessions organized by remote IP address, and listens on UDP port 2345
(specified as its command line argument) for queries; only queries from
allowed IP addresses are accepted and responded to, other queries are
silently ignored - configure @inet_acl accordingly, defaults to 127.0.0.1;

  * adding the following line to amavisd.conf, matching the chosen port
number to the one specified on the command line to the p0f-analyzer.pl:

  $os_fingerprint_method = 'p0f:127.0.0.1:2345';

makes amavisd send queries to p0f-analyzer.pl (on the supplied IP address
and UDP port number) to collect information about remote SMTP client's OS;
collected response is then supplied as a header field when SpamAssassin
is invoked;  query/response is very quick and imposes no burden on amavisd
process nor does its extend its processing time. The $os_fingerprint_method
setting is also a member of policy banks to make it more flexible to
disable fingerprinting for 

xxxl spam

2006-04-11 Thread mouss
since most filters skip large messages, it may be tempting for spammers 
to send large messagess:


- using a large but invisible part (either by using mime and putting a 
large text part in an alternative mime, or using invisible chars 
before their own text).


- using a large image

- large tail (spammers can append anything).

- unused attachments

questions:
- has this already been seen?
- how can we mitigate this?


my first thought would be to process the message before passing it to 
the filter. In particular, are there drawbacks/benefits if I remove 
attachments before passing them to SA (or any other filter)?


Re: xxxl spam

2006-04-11 Thread Matt Kettler
mouss wrote:
 since most filters skip large messages, it may be tempting for spammers
 to send large messagess:
 
 - using a large but invisible part (either by using mime and putting a
 large text part in an alternative mime, or using invisible chars
 before their own text).
 
 - using a large image
 
 - large tail (spammers can append anything).
 
 - unused attachments
 
 questions:
 - has this already been seen?

I've not seen it with dummy text, but I have seen the large image spam. However,
it's very rare. The problem being that if you're a large-volume spammer, large
messages take a longer time to send, and thus reduce your spams/minute.

There's only one spammer that's done this to me. There's some group of stores in
Guatemala that sends me high-res scans of their newspaper.

Consejeros en Finanzas Empresariales, some kind of bank
La Cuacao  - some kind of electronics shop? or an eye doctor?
cefesa hardware - a True Value hardware store.


Why anyone in Guatemala thinks I'll visit their store to spend Q. 22 on a
patio log fake fire log or Q. 85 on a generic brand weed and feed fertilizer
is beyond me.

But other than these guys, I don't get any spams 250kb.

 - how can we mitigate this?

Personally, I think it is largely self-mitigating. Their size greatly limits
their potential distribution.

As I see it,  there's very little large-spam out there.

 
 
 my first thought would be to process the message before passing it to
 the filter. In particular, are there drawbacks/benefits if I remove
 attachments before passing them to SA (or any other filter)?

Well, SA automatically ignores attachments in recent versions. However,
hash-based plugins like razor, dcc, and pyzor work best when seeing all the
attachments.






Re: xxxl spam

2006-04-11 Thread Theo Van Dinter
On Tue, Apr 11, 2006 at 02:14:26PM -0400, Matt Kettler wrote:
 Well, SA automatically ignores attachments in recent versions. However,
 hash-based plugins like razor, dcc, and pyzor work best when seeing all the
 attachments.

For completeness, the first sentence isn't exactly true.
SA automatically ignores attachments for the standard set of body,
header, and uri rules, but it still has to read in the data, store it in
the message tree internally, and make the entire message text available
for full rules.

There are also things like the AntiVirus plugin, etc, which may go ahead
and decode attachments and do things with the data.  I could easily see
a plugin for ClamAV, or something scanning image files, etc.

I think that at some point, the default size could go up, but I wouldn't
try it for now.

-- 
Randomly Generated Tagline:
 Zoidberg: That's where I'm meeting Uncle Zoid for lunch to 
  discuss my Hollywood dream. The next time you see me, don't
  be surprised if I've eaten. 


pgp3qHm0nZQ6E.pgp
Description: PGP signature


Re: xxxl spam

2006-04-11 Thread Matt Kettler
Theo Van Dinter wrote:
 On Tue, Apr 11, 2006 at 02:14:26PM -0400, Matt Kettler wrote:
 Well, SA automatically ignores attachments in recent versions. However,
 hash-based plugins like razor, dcc, and pyzor work best when seeing all the
 attachments.
 
 For completeness, the first sentence isn't exactly true.
 SA automatically ignores attachments for the standard set of body,
 header, and uri rules, but it still has to read in the data, store it in
 the message tree internally, and make the entire message text available
 for full rules.

Fair enough...

 There are also things like the AntiVirus plugin, etc, which may go ahead
 and decode attachments and do things with the data.  I could easily see
 a plugin for ClamAV, or something scanning image files, etc.
 
 I think that at some point, the default size could go up, but I wouldn't
 try it for now.


FWIW, it might be worth considering the approach used by MailScanner.

MailScanner still scans large messages, but truncates messages over Max
SpamAssassin Size. Presumably it does in a manner that still has the correct
mime boundaries, because I don't get any kind of superflous rule hits regarding
mime boundaries on large messages.

I've currently got this set to 60k, but MailScanner defaults to 30k.

Of course, this can't work if you're using any kind of encapsulation options in
report_safe, but since MailScanner does all the markup itself, it doesn't hurt
it to send Mail::SpamAssassin a truncated version. Converting this to the
spamc/spamd model might be kind of difficult due to this, but it's worth
considering for spamc -c.







Re: xxxl spam

2006-04-11 Thread Theo Van Dinter
On Tue, Apr 11, 2006 at 02:46:41PM -0400, Matt Kettler wrote:
 Of course, this can't work if you're using any kind of encapsulation options 
 in
 report_safe, but since MailScanner does all the markup itself, it doesn't hurt
 it to send Mail::SpamAssassin a truncated version. Converting this to the
 spamc/spamd model might be kind of difficult due to this, but it's worth
 considering for spamc -c.

It's been suggested before, but it doesn't quite work for SA
unfortunately.  SA is designed to be a generic mail filter, and some
rules/plugins/etc expect to be able to see the entire original contents
of the message, so we can't really trim off pieces.  Also, things like
spamc have no concept of what a message actually is, they just read in
a bunch of data and send it somewhere, so the full message would have to
be read in by spamd before anything could be trimmed off of it.  At that
point there's not a lot of savings in trimming off attachments (though the
raw versions could potentially be stored in temp files instead of memory).

And then, as you said, with encapsulation and such, we'd need the whole of the
message anyway.

-- 
Randomly Generated Tagline:
NT is secure as long as you don't remove the shrink wrap. - G. Myers


pgpcoEruv9FrR.pgp
Description: PGP signature


Re: xxxl spam

2006-04-11 Thread Matt Kettler
Theo Van Dinter wrote:
 On Tue, Apr 11, 2006 at 02:46:41PM -0400, Matt Kettler wrote:
 Of course, this can't work if you're using any kind of encapsulation options 
 in
 report_safe, but since MailScanner does all the markup itself, it doesn't 
 hurt
 it to send Mail::SpamAssassin a truncated version. Converting this to the
 spamc/spamd model might be kind of difficult due to this, but it's worth
 considering for spamc -c.
 
 It's been suggested before, but it doesn't quite work for SA
 unfortunately.  SA is designed to be a generic mail filter, and some
 rules/plugins/etc expect to be able to see the entire original contents
 of the message, so we can't really trim off pieces.  Also, things like
 spamc have no concept of what a message actually is, they just read in
 a bunch of data and send it somewhere, so the full message would have to
 be read in by spamd before anything could be trimmed off of it.  At that
 point there's not a lot of savings in trimming off attachments (though the
 raw versions could potentially be stored in temp files instead of memory).
 
 And then, as you said, with encapsulation and such, we'd need the whole of the
 message anyway.

Agreed.. the only part of sa that this would be straightforward for would be
spamc -c.

At that point, spamc isn't piping the message back out, and isn't doing
encapsulation, so truncation would be irrelevant.


Re: xxxl spam

2006-04-11 Thread Mark Martinec
mouss wrote:
 since most filters skip large messages, it may be tempting for spammers
 to send large messagess:

I did some statistical analysis few weeks ago with SA 3.1.1
(SA called from amavisd-new, but that is beside the point).

Please see:

  http://www.ijs.si/software/amavisd/fig4.gif
Shows spam score vs. mail size as a scattergram

  http://www.ijs.si/software/amavisd/fig5.gif
Shows elapsed time for mail checking vs. mail size
(shown is total time, but 90% of it reflects processing
within SA and its plugins)

As a curiosity (but off topic), harvesting results from p0f
(passive operating system fingerprinting), here are two more:

  http://www.ijs.si/software/amavisd/fig1.gif
Spam score vs. IP distance in hops (our server is
in European academic network Geant)

  And perhaps most interesting of all (by again OT):

  http://www.ijs.si/software/amavisd/fig2.gif
Spam score distribution as a percentage of all mail,
separate by each sending mail client's operating system.

Mark


relay distance and spam [was xxxl spam]

2006-04-11 Thread mouss

Mark Martinec wrote:

  http://www.ijs.si/software/amavisd/fig1.gif
Spam score vs. IP distance in hops (our server is
in European academic network Geant)



This one is amazing. there seems to be an empty space (most mail has 
nhops = 10 or = 14). I would guess that most ham wih large nhops is 
from mailing lists. so the question is what would be the graphic if you 
take into account:

- mailing lists forwarding
- multiple internal hops at either sender or receiver (I have N 
Received headers added by my own MTA. and for mail fetched from an MSP, 
there are still more).


I would conjecture that most legitimate mail has two real hops (the 
sending MTA and the receiving MTA).


RE: relay distance and spam [was xxxl spam]

2006-04-11 Thread Matthew.van.Eerde
mouss wrote:
 I would conjecture that most legitimate mail has two real hops (the
 sending MTA and the receiving MTA).

That would be one hop.


Re: xxxl spam

2006-04-11 Thread Kenneth Porter
On Tuesday, April 11, 2006 2:14 PM -0400 Matt Kettler 
[EMAIL PROTECTED] wrote:



I've not seen it with dummy text, but I have seen the large image spam.
However, it's very rare. The problem being that if you're a large-volume
spammer, large messages take a longer time to send, and thus reduce your
spams/minute.


You can also impose this cost on spammers by enabling the GreetPause 
feature in the more recent versions of sendmail. This tells sendmail not to 
answer right away when receiving a connection, and to drop the connection 
if anything is received before the greeting is sent out. This punishes 
slammer spammers who push the whole SMTP conversation through and then 
disconnect. It also ensures that every connection from an unknown sender 
takes a minimum amount of time. You can add exceptions in your access 
database for your customers and frequent correspondents. For example, this 
exception drops the GreetPause to zero for my LAN (example is for 
10.123/16):


GreetPause:10.123   0



greetpause was Re: xxxl spam

2006-04-11 Thread Michele Neylon:: Blacknight.ie
Kenneth Porter wrote:

 You can also impose this cost on spammers by enabling the GreetPause
 feature in the more recent versions of sendmail. This tells sendmail not
 to answer right away when receiving a connection, and to drop the
 connection if anything is received before the greeting is sent out. This
 punishes slammer spammers who push the whole SMTP conversation through
 and then disconnect. It also ensures that every connection from an
 unknown sender takes a minimum amount of time. You can add exceptions in
 your access database for your customers and frequent correspondents. For
 example, this exception drops the GreetPause to zero for my LAN (example
 is for 10.123/16):
 
 GreetPause:10.123   0


Is this as effective as greylisting?


-- 
Mr Michele Neylon
Blacknight Solutions
Quality Business Hosting  Colocation
http://www.blacknight.ie/
Tel. 1850 927 280
Intl. +353 (0) 59  9183072
Direct Dial: +353 (0)59 9183090
Fax. +353 (0) 59  9164239


Re: greetpause was Re: xxxl spam

2006-04-11 Thread Mike Jackson

You can also impose this cost on spammers by enabling the GreetPause
feature in the more recent versions of sendmail. This tells sendmail not
to answer right away when receiving a connection, and to drop the
connection if anything is received before the greeting is sent out. This
punishes slammer spammers who push the whole SMTP conversation through
and then disconnect. It also ensures that every connection from an
unknown sender takes a minimum amount of time. You can add exceptions in
your access database for your customers and frequent correspondents. For
example, this exception drops the GreetPause to zero for my LAN (example
is for 10.123/16):

GreetPause:10.123   0



Is this as effective as greylisting?


Perhaps not, but it also doesn't have any of the drawbacks (ie, delayed 
mail, need to whitelist non-behaving servers, etc.). I recently enabled it 
on my servers, and it's been stopping a ton of mail without any complaints 
from legitimate senders. 



Re: greetpause was Re: xxxl spam

2006-04-11 Thread mouss

Mike Jackson wrote:

You can also impose this cost on spammers by enabling the GreetPause
feature in the more recent versions of sendmail. This tells sendmail not
to answer right away when receiving a connection, and to drop the
connection if anything is received before the greeting is sent out. This
punishes slammer spammers who push the whole SMTP conversation through
and then disconnect. It also ensures that every connection from an
unknown sender takes a minimum amount of time. You can add exceptions in
your access database for your customers and frequent correspondents. For
example, this exception drops the GreetPause to zero for my LAN (example
is for 10.123/16):

GreetPause:10.123   0



Is this as effective as greylisting?


Perhaps not, but it also doesn't have any of the drawbacks (ie, delayed 
mail, need to whitelist non-behaving servers, etc.). I recently enabled 
it on my servers, and it's been stopping a ton of mail without any 
complaints from legitimate senders.





greetpause only blocks some ratware spam. If I was to write spam and/or 
viruses, I would just add a sleep(x):


given N victims, choose M among them:

for i=0; iM; i++
time(i) = connect to server(i)
for i=0; iM; i++
t = now - time(i) + min_sleep
sleep(t)
send_junk()
(can be tuned if using multiple threads/processes).


so greetpause will certainly stop some ratware spam, but is not a full 
solution.


also, if your greetpause requires sleep()-ing on every connection, then 
it's not acceptable (for me) as this is a call for DoS. I am not aware 
of any async MTA [read: one that will not sleep, but will handle other 
connections in the meantime], at least in the open source world.


If you are after miscreants, then partial-greylisting is probably more 
effective (I mean greylisting some of the connections, based on the 
client name, ip, behaviour, ... etc).








Re: relay distance and spam [was xxxl spam]

2006-04-11 Thread mouss

[EMAIL PROTECTED] wrote:

mouss wrote:

I would conjecture that most legitimate mail has two real hops (the
sending MTA and the receiving MTA).


That would be one hop.




depends on how you count:

MUA - my MTA1 - your MTA - your mailbox

that's two MTAs, so that's two hops. I prefer to count it this way 
because this corresponds to Received headers.



a direct mail would be
MUA - MTA - mailbox
and is either:
- legitimate from trusted sources
- direct spam
- an exception

if you have an internal MTA and a relay host, or if you have an MTA and 
relay via an ISP, that adds a hop


If you can remove the reception hops (since you know them, you can 
ignore them in your computations), most legitimate cross-domain mail 
would be 2-h mail (this is what I believe).




RE: greetpause was Re: xxxl spam

2006-04-11 Thread Matthew.van.Eerde
mouss wrote:
 so greetpause will certainly stop some ratware spam, but is not a
 full solution.

Agreed.  Spammers have access to all the free CPU bandwidth and processing time 
they can steal - legitimate MTAs are limited to a budget.  Any anti-spam 
solution that simply rewards CPU and bandwidth spent* is playing into the hands 
of the spammers.

* Email stamps, factor this product of large primes challanges, greetpause

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Re: relay distance and spam [was xxxl spam]

2006-04-11 Thread Kelson

mouss wrote:

- multiple internal hops at either sender or receiver (I have N Received
headers added by my own MTA. and for mail fetched from an MSP, there are
still more).


Actually, if I'm reading this right, it's the number of IP hops between
the sending server and the receiving server -- in other words, how many
lines you'd see if you were on the receiving server and ran traceroute 
to the sending MTA.


I've rarely seen any messages that passed through more than 5 MTAs --
certainly not enough to account for the graph.  But 10 routers between 
me and the sender?  That doesn't seem unreasonable at all.


--
Kelson Vibber
SpeedGate Communications www.speed.net


Re: relay distance and spam [was xxxl spam]

2006-04-11 Thread Mathias Homann
Am Dienstag, 11. April 2006 22:28 schrieb mouss:
 [EMAIL PROTECTED] wrote:
  mouss wrote:
  I would conjecture that most legitimate mail has two real hops
  (the sending MTA and the receiving MTA).
 
  That would be one hop.

 depends on how you count:

   MUA - my MTA1 - your MTA - your mailbox

 that's two MTAs, so that's two hops. I prefer to count it this way
 because this corresponds to Received headers.

well, here it looks like this:

MUA - senders MTA - my external MTA -(fetchmail)- my internal MTA 
- one internal hop through spamassassin - one internal hop through 
antivirus - my MUA

and at my workplace its a similar setup, without the fetchmail.


bye,
MH


RE: relay distance and spam [was xxxl spam]

2006-04-11 Thread Matthew.van.Eerde
Kelson wrote:
 Actually, if I'm reading this right, it's the number of IP hops
 between the sending server and the receiving server -- in other
 words, how many lines you'd see if you were on the receiving server
 and ran traceroute to the sending MTA.

Ah... that makes much more sense :)

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Spam and the Internet [Was: xxxl spam]

2006-04-11 Thread mouss

Matt Kettler wrote:


There's only one spammer that's done this to me. There's some group of stores in
Guatemala that sends me high-res scans of their newspaper.

Consejeros en Finanzas Empresariales, some kind of bank
La Cuacao  - some kind of electronics shop? or an eye doctor?
cefesa hardware - a True Value hardware store.


Why anyone in Guatemala thinks I'll visit their store to spend Q. 22 on a
patio log fake fire log or Q. 85 on a generic brand weed and feed fertilizer
is beyond me.



dunno, but I can tell you that the net if full of people who love me and 
want me good. I keep winning all the lotteries. I can buy software at 
cheap prices (if someone can tell these guys that I have nor the time 
nor the need to use photoshop, that I already have windows+office, ... 
etc, that may save them some time/resources they can spend helping me in 
other areas:). others seems to need an urgent contact for an important 
relationship. I'm feeling like the ceo of a large company. Some even 
seem to know private infos about me. It seems I need some special pills. 
but for now, the names of the pills seem to change all the time. I'll 
wait until they get an agreement on how to name them:). They also keep 
talking about inches. if someone can tell them that we use the metric 
system here, I would be grateful...


Re: Spam and the Internet [Was: xxxl spam]

2006-04-11 Thread Matt Kettler
mouss wrote:
 Matt Kettler wrote:


 Why anyone in Guatemala thinks I'll visit their store to spend Q. 22
 on a
 patio log fake fire log or Q. 85 on a generic brand weed and feed
 fertilizer
 is beyond me.

 
 dunno, but I can tell you that the net if full of people who love me and
 want me good. I keep winning all the lotteries. I can buy software at
 cheap prices (if someone can tell these guys that I have nor the time
 nor the need to use photoshop, that I already have windows+office, ...
 etc, that may save them some time/resources they can spend helping me in
 other areas:). others seems to need an urgent contact for an important
 relationship. I'm feeling like the ceo of a large company. Some even
 seem to know private infos about me. It seems I need some special pills.
 but for now, the names of the pills seem to change all the time. I'll
 wait until they get an agreement on how to name them:). They also keep
 talking about inches. if someone can tell them that we use the metric
 system here, I would be grateful...

Yeah, but all those actually have some chance of financial gain from someone
located in another country. You'd have to be stupid, but it's possible, because
all of those can close an electronic transaction with you.


These spams I get from .gt don't offer any kind of online ordering. They are ads
that you'd have to physically travel to the store in Guatemala to take advantage
of them. They're ordinary weekly sales fliers for an ordinary local store that's
so small that only 6 cars can park in front of it. (They have pictures of the
store in some of them). Delivered to my mailbox as 1/2 meg .jpg files. It's
really quite bizarre, and amusing.

Here's one, if you want to see it:

http://mywebpages.comcast.net/mkettler/spam.jpg


There's pretty close to zero chance that anyone in the US is going to hop on a
plane and fly to Guatemala to buy ordinary lawn care products from a small
store. But that's the kind of ads I'm getting.


Re: relay distance and spam [was xxxl spam]

2006-04-11 Thread Mark Martinec
On Tuesday April 11 2006 23:17, Kelson wrote:
 mouss wrote:
  - multiple internal hops at either sender or receiver (I have N
  Received headers added by my own MTA. and for mail fetched from an MSP,
  there are still more).

 Actually, if I'm reading this right, it's the number of IP hops between
 the sending server and the receiving server -- in other words, how many
 lines you'd see if you were on the receiving server and ran traceroute
 to the sending MTA.

Exactly. It is usually the number of hops a traceroute running on MTA
would show when tracing route to the host from which it is receiving a 
message. (I say usually, because routes can be asymmetric, and we are 
actually observing a remaining TTL field value in the IP packet, taking
into account an educated guess on the initial setting, based on detected
OS type).

Btw, a horizontal spread of 1 unit (in fig1) is an artificial white noise
added to spread numerous dots somewhat for a better view.

I guess we are somewhat lucky seeing a rather clearcut separation of
nearby friendly and distant wild-world hosts, and can use IP distance to 
contribute a little score weight on distant hosts and subtract a little
for nearby hosts.

  Mark


RE: greetpause was Re: xxxl spam

2006-04-11 Thread Kenneth Porter

On Tuesday, April 11, 2006 1:37 PM -0700 [EMAIL PROTECTED] wrote:


Agreed.  Spammers have access to all the free CPU bandwidth and
processing time they can steal - legitimate MTAs are limited to a budget.
Any anti-spam solution that simply rewards CPU and bandwidth spent* is
playing into the hands of the spammers.


The original concern was that spammers would use larger messages to avoid 
the size cutoff in SA, but this was countered because spammers have to 
reduce their message rate to send larger messages. Server-side, GreetPause 
(and greylisting) forces a client to reduce its message rate.


If the client has unlimited bandwidth and doesn't care about the reduced 
message rate, it might as well shovel giant messages. In for a penny, in 
for a pound.