Re: spam wanted :)

2008-04-10 Thread Rich Kulawiec

On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
 for a measurement experiment, i would like O(100k) *headers* from spam
 from europe and a similar sample from the states.

Request for clarification: do you mean spam originating at IP addresses
believed to be in Europe or spam received at a mail server located in
Europe or spam putatively from domains in Europe or something else?

---Rsk


Re: spam wanted :)

2008-04-10 Thread William Waites

On Thu, Apr 10, 2008 at 08:55:21AM -0400, Rich Kulawiec wrote:
 
 On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
  for a measurement experiment, i would like O(100k) *headers* from spam
  from europe and a similar sample from the states.
 
 Request for clarification: do you mean spam originating at IP addresses
 believed to be in Europe or spam received at a mail server located in
 Europe or spam putatively from domains in Europe or something else?

One thing that happened when I moved to Europe and started doing
business in Germany is that relatively soon I began receiving spam in
German (which seems to have quite different content, and sales
strategy, actually, perhaps reflecting cultural differences in the
manner of buying and selling between the anglophone world and Germany).

Trying to separate out what in Europe means in this case seems to come
down to having given out email addresses to web sites and collegues in
a different language environment rather than physical presence of either
myself or my mailserver in either North America or Europe. I guess the
German spam I have been receiving is only european in that German
speakers happen to be mostly in Europe, which is not true of English
speakers.

I wonder, is the (English language) spam set that one is likely to receive
in Australia statistically different than what one is likely to receive in
the US?

-w


Re: spam wanted :)

2008-04-10 Thread Marshall Eubanks



On Apr 10, 2008, at 9:35 AM, William Waites wrote:



On Thu, Apr 10, 2008 at 08:55:21AM -0400, Rich Kulawiec wrote:


On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
for a measurement experiment, i would like O(100k) *headers* from  
spam

from europe and a similar sample from the states.


Request for clarification: do you mean spam originating at IP  
addresses
believed to be in Europe or spam received at a mail server  
located in
Europe or spam putatively from domains in Europe or something  
else?


One thing that happened when I moved to Europe and started doing
business in Germany is that relatively soon I began receiving spam in
German (which seems to have quite different content, and sales
strategy, actually, perhaps reflecting cultural differences in the
manner of buying and selling between the anglophone world and  
Germany).


I receive serious amounts of spam in Hebrew and Russian, and haven't  
even been to

either Israel or Russia recently.

Regards
Marshall




Trying to separate out what in Europe means in this case seems to  
come

down to having given out email addresses to web sites and collegues in
a different language environment rather than physical presence of  
either

myself or my mailserver in either North America or Europe. I guess the
German spam I have been receiving is only european in that German
speakers happen to be mostly in Europe, which is not true of English
speakers.

I wonder, is the (English language) spam set that one is likely to  
receive
in Australia statistically different than what one is likely to  
receive in

the US?

-w




Re: spam wanted :)

2008-04-10 Thread Randy Bush

Rich Kulawiec wrote:
 On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
 for a measurement experiment, i would like O(100k) *headers* from spam
 from europe and a similar sample from the states.
 Request for clarification: do you mean spam originating at IP addresses
 believed to be in Europe

yes.

and, because i have gotten a lot of well-meaning but non-reading offers,
to repeat

 this would be a straight sample, before filtering, ip address
 blocking, etc.

i realize this is difficult, as all of us go through much effort to
reject this stuff as early as possible.  but it will be a sample
unbiased by your filtering techniques.

randy


RE: spam wanted :)

2008-04-10 Thread Jamie Bowden

s/recently/ever/

I'd be happy if I could tell Gmail to delete anything in a non Roman
character set.  I don't read Hebrew, Arabic, Kanji, Hangul, Cyrillic, or
any of the other various character sets I get spam in.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Marshall Eubanks
Sent: Thursday, April 10, 2008 9:39 AM
To: William Waites
Cc: Rich Kulawiec; North American Network Operators Group
Subject: Re: spam wanted :)



On Apr 10, 2008, at 9:35 AM, William Waites wrote:


 On Thu, Apr 10, 2008 at 08:55:21AM -0400, Rich Kulawiec wrote:

 On Thu, Apr 10, 2008 at 06:32:53PM +0900, Randy Bush wrote:
 for a measurement experiment, i would like O(100k) *headers* from  
 spam
 from europe and a similar sample from the states.

 Request for clarification: do you mean spam originating at IP  
 addresses
 believed to be in Europe or spam received at a mail server  
 located in
 Europe or spam putatively from domains in Europe or something  
 else?

 One thing that happened when I moved to Europe and started doing
 business in Germany is that relatively soon I began receiving spam in
 German (which seems to have quite different content, and sales
 strategy, actually, perhaps reflecting cultural differences in the
 manner of buying and selling between the anglophone world and  
 Germany).

I receive serious amounts of spam in Hebrew and Russian, and haven't  
even been to
either Israel or Russia recently.

Regards
Marshall



 Trying to separate out what in Europe means in this case seems to  
 come
 down to having given out email addresses to web sites and collegues in
 a different language environment rather than physical presence of  
 either
 myself or my mailserver in either North America or Europe. I guess the
 German spam I have been receiving is only european in that German
 speakers happen to be mostly in Europe, which is not true of English
 speakers.

 I wonder, is the (English language) spam set that one is likely to  
 receive
 in Australia statistically different than what one is likely to  
 receive in
 the US?

 -w



Re: spam wanted :)

2008-04-10 Thread Randy Bush

 Request for clarification: do you mean spam originating at IP addresses
 believed to be in Europe
 yes.

blush a!  speaking of non-reading blush

i mean spam arriving at port 25 on a european host.  and an unfiltered
unblocked port 25, no dnsbl, ...

it looks like i have a great stateside volunteer source, though the
proof will be known when we have the data.  and we're in asia and have
data from here.  so it's europe i need.

randy


Re: spam wanted :)

2008-04-10 Thread Bjørn Mork

Randy Bush [EMAIL PROTECTED] writes:

 this would be a straight sample, before filtering, ip address
 blocking, etc.

 i realize this is difficult, as all of us go through much effort to
 reject this stuff as early as possible.  but it will be a sample
 unbiased by your filtering techniques.

How do you classify email as spam without adding bias?


Bjørn


Re: spam wanted :)

2008-04-10 Thread Joe Greco

 Randy Bush [EMAIL PROTECTED] writes:
 
  this would be a straight sample, before filtering, ip address
  blocking, etc.
 
  i realize this is difficult, as all of us go through much effort to
  reject this stuff as early as possible.  but it will be a sample
  unbiased by your filtering techniques.
 
 How do you classify email as spam without adding bias?

You can always claim bias.

There's often been debate, even in the anti-spam community, about what
spam actually means.  The meaning has repeatedly been diluted over the
years, to a point where some now define it merely as that which we do
not want, an attitude supported in code by some service providers who
now sport great big Easy Buttons (with apologies to any office supply
chain) labelled This Is Spam.

Even so, there's some complexity - users making typos, for example.

However, the easiest way to avoid bias is to look for a mail stream that
has the quality of not having any valid recipients.  There will be, of 
course, someone who will disagree with me that mail sent to an address 
that hasn't been valid in years, and whose parent domain was unresolvable
in DNS for at least a year is spam.  However, it's as unbiased as I can
reasonably imagine being.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again. - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


RE: spam wanted :)

2008-04-10 Thread Martin Hannigan

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Marshall Eubanks
 Sent: Thursday, April 10, 2008 9:39 AM
 To: William Waites
 Cc: Rich Kulawiec; North American Network Operators Group
 Subject: Re: spam wanted :)
 
 

[ clip ]

 
 I receive serious amounts of spam in Hebrew and Russian, and haven't 
 even been to either Israel or Russia recently.
 
 Regards
 Marshall
 



I started getting spam in Icelandic  24 hours after my account was set
up. I get Russian, Chinese, and Hebrew spam all the time. The most spam
I receive is from an old domain that I turned off the MX records. Every
now and then I turn them back on to see what's flowing and it never
changes. Within seconds.

[obOp] I think that the language change defeats many of the heuristics
found in common spam appliances. 


--
Martin Hannigan  http://www.verneglobal.com/
Verne Global e: [EMAIL PROTECTED]
Keflavik, Icelandp: +16178216079



Re: spam wanted :)

2008-04-10 Thread Randy Bush

 this would be a straight sample, before filtering, ip address
 blocking, etc.
 i realize this is difficult, as all of us go through much effort to
 reject this stuff as early as possible.  but it will be a sample
 unbiased by your filtering techniques.
 How do you classify email as spam without adding bias?

reasonable question.  i suspect you pull out the 0.5% of the inbound you
actually wanted and consider the bias small.  as the dnsbls alone block
way over 90% of the inbound here, i would not classify that as small.

randy