Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-21 Thread Justin Shore

On Wed, 18 Jun 2003, Miles Fidelman wrote:

 It occurs to me that a lot of people on this list might have that sort of
 quantitative data - so... any comments?

You might find this useful.

http://zebulon.miester.org/spam/

Justin



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-19 Thread Curtis Maurand


Interesting pattern.  Kind of looks like cutting z's.  :-)

curtis

just me said:

 On Wed, 18 Jun 2003, Miles Fidelman wrote:

  It occurs to me that a lot of people on this list might have that sort
  of quantitative data - so... any comments?

  Regards,

  Miles Fidelman


 For my little corner:
 http://mrtg.snark.net/spam/

 It seems 1:1 is the norm these days, at least at my scale.

 matto

 [EMAIL PROTECTED]darwin
   Flowers on the razor wire/I know you're here/We are few/And far
   between/I was thinking about her skin/Love is a many splintered
   thing/Don't be afraid now/Just walk on in. #include disclaim.h





Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-19 Thread Andy Dills

On Wed, 18 Jun 2003, just me wrote:

 For my little corner:
 http://mrtg.snark.net/spam/

 It seems 1:1 is the norm these days, at least at my scale.

How do you get your mail delivery attempts to occur so linearly? :)

I think something's busted with your mrtg script...

Here's the stats for one of the smtp boxes in our cluster (83% rejection
rate...and it's +/- 1% across the other boxes in the cluster):

Postfix log summaries for Jun 18

Grand Totals

messages

 396087   received
 148369   delivered
  0   forwarded
672   deferred  (9504  deferrals)
   1636   bounced
718k  rejected (83%)
  0   reject warnings
  0   held
  0   discarded (0%)


Andy

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---




Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-19 Thread Jack Bates
Andy Dills wrote:
How do you get your mail delivery attempts to occur so linearly? :)

I think something's busted with your mrtg script...

Depends on which stats he wants. He's showing the total since midnight 
in the graph instead of the count since the last run.

-Jack



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-19 Thread Andy Dills

On Thu, 19 Jun 2003, Jack Bates wrote:


 Andy Dills wrote:
  How do you get your mail delivery attempts to occur so linearly? :)
 
  I think something's busted with your mrtg script...
 

 Depends on which stats he wants. He's showing the total since midnight
 in the graph instead of the count since the last run.

Yeah, mea culpa :)

Don't know why you have your graphs set up that way, unless you have no
other way of reporting aggregate scores for the day...

http://people.ee.ethz.ch/~oetiker/webtools/mrtg/reference.html

In the absence of 'gauge' or 'absolute' options, MRTG treats variables as
counters and calculates the difference between the current and the
previous value and divides that by the elapsed time between the last two
readings to get the value to be plotted.

Sounds like you have 'gauge option set where you shouldn't...unless that
is exactly how you want the graphs to behave, in which case I'll shut up
and respect your right to run mrtg any way you want. :)

Andy

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-19 Thread just me

Not a lot to break; here's the script in its entirety:

#!/usr/local/bin/bash

grep -c mailer=local /var/log/maillog
egrep -c '[EMAIL PROTECTED]|reject|njabl' /var/log/maillog

A lot of mail traffic on my box is mailing lists; perhaps thats why
the graphs look so smooth.

matto


On Thu, 19 Jun 2003, Andy Dills wrote:

  On Wed, 18 Jun 2003, just me wrote:

   For my little corner:
   http://mrtg.snark.net/spam/
  
   It seems 1:1 is the norm these days, at least at my scale.

  How do you get your mail delivery attempts to occur so linearly? :)

  I think something's busted with your mrtg script...

  Here's the stats for one of the smtp boxes in our cluster (83% rejection
  rate...and it's +/- 1% across the other boxes in the cluster):

  Postfix log summaries for Jun 18

  Grand Totals
  
  messages

   396087   received
   148369   delivered
0   forwarded
  672   deferred  (9504  deferrals)
 1636   bounced
  718k  rejected (83%)
0   reject warnings
0   held
0   discarded (0%)


  Andy

  ---
  Andy Dills
  Xecunet, Inc.
  www.xecu.net
  301-682-9972
  ---





[EMAIL PROTECTED]darwin
   Flowers on the razor wire/I know you're here/We are few/And far
   between/I was thinking about her skin/Love is a many splintered
   thing/Don't be afraid now/Just walk on in. #include disclaim.h



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-19 Thread just me

On Thu, 19 Jun 2003, Andy Dills wrote:

  Yeah, mea culpa :)

  Don't know why you have your graphs set up that way, unless you have no
  other way of reporting aggregate scores for the day...

  http://people.ee.ethz.ch/~oetiker/webtools/mrtg/reference.html

  In the absence of 'gauge' or 'absolute' options, MRTG treats variables as
  counters and calculates the difference between the current and the
  previous value and divides that by the elapsed time between the last two
  readings to get the value to be plotted.

  Sounds like you have 'gauge option set where you shouldn't...unless that
  is exactly how you want the graphs to behave, in which case I'll shut up
  and respect your right to run mrtg any way you want. :)


My configuration lets me see daily totals as well as rate vs.
time-of-day pretty easily. Using absolute, the only thing I'd be
able to see is a running total. I like the ability to compare traffic
between days, as well as see when the bulk of my mail is delivered-
any anomalous traffic is pretty easy to spot.

matto

[EMAIL PROTECTED]darwin
   Flowers on the razor wire/I know you're here/We are few/And far
   between/I was thinking about her skin/Love is a many splintered
   thing/Don't be afraid now/Just walk on in. #include disclaim.h



OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Miles Fidelman

Hi Folks,

Someone on the cybertelecom list raised a question about the real costs of
handling spam (see below) in terms of computer resources, transmission,
etc.  This dovetailed a discussion I had recently with several former BBN
colleagues - where someone pointed out that email is not a very high
percentage of total internet traffic, compared to all the multimedia and
video floating around these days.

Since a lot of the arguments about spam hinge on the various costs it
imposes on ISPs, it seems like it would be a good thing to get a handle on
quantitative data.

It occurs to me that a lot of people on this list might have that sort of
quantitative data - so... any comments?

Regards,

Miles Fidelman

-- Forwarded message --
Date: Wed, 18 Jun 2003 09:15:08 -0400
From: Timothy Denton [EMAIL PROTECTED]
Reply-To: Telecom Regulation  the Internet
[EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Issue: the Volume of unwanted email

Cybertelecomers:

I want the advice and knowledge of people on this list. I dared not use the
word spam lest I be filtered out, but the issue is the economic cost of spam
for ISPs.

There has been much to-do about spam of late. Figures from Canarie show that
SMTP transmissions account for about .5% of the volume of Internet traffic.
This may be typical of backbone networks, or not. Commercial networks are
jealous of revealing information of this nature.

ISPs report that spam is now about 46% of email, and that it adds to the
cost of transmissions because of the extra machines that have to be bought
and operated.

Question:

What is the economic cost of handling all this spam, in terms of additional
boxes, software, transmission costs etc?

I am aware that spam adds large costs in terms of time and attention at the
user end. Is there evidence of what it adds in terms of hardware and
software?

As we head toward legislative remedies in the US and Canada, I would like to
have a better idea of the economic impact of spam.

Timothy Denton, BA,BCL
37 Heney Street
Ottawa, Ontario
Canada K1N 5V6
www.tmdenton.com
1-613-789-5397
[EMAIL PROTECTED]





Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Eric Brunner-Williams in Portland Maine

While the question (metrics for operators, backbone-to-retail, spam) is
current in the asrg list, the question is posed by (informally) by the
(outgoing) secretary of the ICANN Registrar's Constituency to a listserv
in the AOL playpen. The question is not current in the Registrar's
Constituency, not is it likely to be, IMHO.

There are several ways nanog'ers can take it, back to the AOL listserv,
or over the fence to the irtf/asrg playpen, or yawn.

There is one modality of spam that interests me technically, one that
Bill touched on in his note in the rr style scanning thread, and
Sean and others have touched on in the use trojans thread. Buffering
up hosts (acquired via technical means), and expending hosts (sending
until some terminal condition occurs) at a rate approximating the rate
of buffer-fill.

Anyone else interested drop me a line. Better still would be the peer
reviewed paper in the open literature that answers all the questions
I've thought of, and haven't thought of.

Eric


Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Jack Bates
Miles Fidelman wrote:

Since a lot of the arguments about spam hinge on the various costs it
imposes on ISPs, it seems like it would be a good thing to get a handle on
quantitative data.
While there is a cost to ISPs reguarding spam, the highest cost is still 
on the recipient. End User's who are outraged by their children getting 
pornography in email, or having trouble finding their legitimate emails 
due to the sheer volume of spam that fills their inbox. There are cases 
where emails are so far out of 822 compliance that the mail clients lock 
up or crash when attempting to read the message. Time is expended across 
the board in handling, blocking, verifying, or deleting spam. In this 
day and age, time is often more valuable than money and the assigned 
value is dependant on the individual. Unfortunately, end user's cannot 
just highlight and hit delete on spam. They must look at almost every 
email to verify that it is spam and not a business or personal email. 
The misleading subject lines and forgeries are making this even more 
necessary.

-Jack





Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Paul Vixie

[EMAIL PROTECTED] (Jack Bates) writes:

 While there is a cost to ISPs reguarding spam, the highest cost is still 
 on the recipient. End User's who are outraged by their children getting 
 pornography in email, or having trouble finding their legitimate emails 
 due to the sheer volume of spam that fills their inbox.

yes.

lartomatic=# select date(entered),count(*)
 from spam
 where date(entered)now()-'20 days'::interval
 group by date(entered)
 order by date(entered) desc;
date| count 
+---
 2003-06-18 |   505
 2003-06-17 |   873
 2003-06-16 |   644
 2003-06-15 |   621
 2003-06-14 |   667
 2003-06-13 |   396
 2003-06-12 |   696
 2003-06-11 |   517
 2003-06-10 |   673
 2003-06-09 |   616
 2003-06-08 |   421
 2003-06-07 |   398
 2003-06-06 |   558
 2003-06-05 |   534
 2003-06-04 |   616
 2003-06-03 |   464
 2003-06-02 |   555
 2003-06-01 |   677
 2003-05-31 |   378
 2003-05-30 |   642
(20 rows)

that's actually not too bad.  the trend is flattening after the Q1'03 surge.

 In this day and age, time is often more valuable than money and the
 assigned value is dependant on the individual. Unfortunately, end user's
 cannot just highlight and hit delete on spam. They must look at almost
 every email to verify that it is spam and not a business or personal
 email.  The misleading subject lines and forgeries are making this even
 more necessary.

let's not lose site of the privacy and property issues, though.  even if
all spam were accurately marked with SPAM: (or ADV:) in its subject
line and there were no false positives, there is no implied right to send
it since it still shifts costs toward the recipient(s).  all communication
should be by mutual consent, and one way or another, some day it will be.
-- 
Paul Vixie


RE: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Drew Weaver

Since 00:00 (EST)

  1 ACL from_senders_bogus
  1 ETRN Mail theft attempt
  1 ACL mta_clients_relay
  1 SMTP Exceeded Hard Error Limit after RSET
  1 ACL mta_clients_onedict
  2 SMTP Exceeded Hard Error Limit after MAIL
  4 ACL mta_clients_senders_regexp
  4 SMTP Exceeded Hard Error Limit after CONNECT
  7 ACL [EMAIL PROTECTED]
  9 SMTP invalid [EMAIL PROTECTED]
 21 ACL helo_hostnames
 42 SMTP unauthorized pipelining
 55 ACL mta_clients_slet
 64 SMTP Exceeded Hard Error Limit after DATA
 93 ACL mta_clients_bogus
107 ACL to_recipients_dead
148 ACL to_local_recipients unknown recipient
354 ACL unauthorized relay
426 ACL mta_clients_blaksender
506 ACL mta_clients_dead
594 ACL from_senders_nxdomain
   1054 ACL from_senders_black
   1125 DNS timeout for MTA PTR hostname (forged @sender.domain)
   1658 SMTP sender address verification in progress
   2251 ACL from_senders_black_regexp
   2678 ACL from_senders_slet
   2734 DNS no A/MX for @sender.domain
   3770 SMTP sender address undeliverable
   4572 RBL rbl-plus.mail-abuse.org
   4703 DNS nxdomain for MTA PTR hostname (forged @sender.domain)
   5152 ACL from_senders_imgfx
   5334 ACL mta_clients_bw
   9846 SMTP sender address unverifiable
  66969 SMTP Exceeded Hard Error Limit after RCPT
 217244 ACL to_relay_recipients unknown recipient

 331531 TOTAL

-Original Message-
From: Paul Vixie [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 18, 2003 2:04 PM
To: [EMAIL PROTECTED]
Subject: Re: OT: question re. the Volume of unwanted email (fwd)


[EMAIL PROTECTED] (Jack Bates) writes:

 While there is a cost to ISPs reguarding spam, the highest cost is still 
 on the recipient. End User's who are outraged by their children getting 
 pornography in email, or having trouble finding their legitimate emails 
 due to the sheer volume of spam that fills their inbox.

yes.

lartomatic=# select date(entered),count(*)
 from spam
 where date(entered)now()-'20 days'::interval
 group by date(entered)
 order by date(entered) desc;
date| count 
+---
 2003-06-18 |   505
 2003-06-17 |   873
 2003-06-16 |   644
 2003-06-15 |   621
 2003-06-14 |   667
 2003-06-13 |   396
 2003-06-12 |   696
 2003-06-11 |   517
 2003-06-10 |   673
 2003-06-09 |   616
 2003-06-08 |   421
 2003-06-07 |   398
 2003-06-06 |   558
 2003-06-05 |   534
 2003-06-04 |   616
 2003-06-03 |   464
 2003-06-02 |   555
 2003-06-01 |   677
 2003-05-31 |   378
 2003-05-30 |   642
(20 rows)

that's actually not too bad.  the trend is flattening after the Q1'03 surge.

 In this day and age, time is often more valuable than money and the
 assigned value is dependant on the individual. Unfortunately, end user's
 cannot just highlight and hit delete on spam. They must look at almost
 every email to verify that it is spam and not a business or personal
 email.  The misleading subject lines and forgeries are making this even
 more necessary.

let's not lose site of the privacy and property issues, though.  even if
all spam were accurately marked with SPAM: (or ADV:) in its subject
line and there were no false positives, there is no implied right to send
it since it still shifts costs toward the recipient(s).  all communication
should be by mutual consent, and one way or another, some day it will be.
-- 
Paul Vixie


Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Eric A. Hall


on 6/18/2003 9:51 AM Miles Fidelman wrote:

 Someone on the cybertelecom list raised a question about the real costs
 of handling spam (see below) in terms of computer resources,
 transmission, etc.  This dovetailed a discussion I had recently with
 several former BBN colleagues - where someone pointed out that email is
 not a very high percentage of total internet traffic, compared to all
 the multimedia and video floating around these days.

The major cost items I've seen are increased bandwidth costs (measured
rate), equipment, filtering software/services, and personnel. These costs
vary depending on the size of the organization and the kinds of service
the organization provides (as a dramatic example, the cost burden is
proportionally higher for an email house like pobox than it would be for
yahoo). There are other indirect costs too; lots of organizations have
stopped sharing backup MX services because of problems with assymetrical
filtering, which can translate into more outages, which can lead to ...

My feeling is that any organization with at least one full-time spam
staffer could probably come up with a minimal cost estimate of $.01 per
message. End-users with measured rate services (eg, cellular) can also
reach similar loads with little effort. But due to the variables and
competitive concerns, you'll probably have to go door-to-door with a
non-disclosure agreement to get people to cough up their exact costs,
assuming they are tracking it.

 There has been much to-do about spam of late. Figures from Canarie show
 that SMTP transmissions account for about .5% of the volume of Internet
 traffic. This may be typical of backbone networks, or not. Commercial
 networks are jealous of revealing information of this nature.

The backbone utilization isn't going to be relevant unless it is high
enough to affect the price of offering the connection. The mailstore is
where the pressure is at. Companies and users who sink capital and time
into unnecessary maintenance have always been the victims. These costs
also have secondary effects, like permanently delaying rate reductions
(sorry your tuition went up again, but we had to buy another cluster),
which in turn affects other parties, but the bulk of the pressure is
wherever the mailstore is at.

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Petri Helenius

 value is dependant on the individual. Unfortunately, end user's cannot
 just highlight and hit delete on spam. They must look at almost every

Isn´t highlight and hit delete exactly what has been implemented since
Mozilla 1.3 and works with almost perfect accuracy after you give it a few
dozen messages to build up the good and bad database with?

PEte



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Jack Bates
Petri Helenius wrote:
Isn´t highlight and hit delete exactly what has been implemented since
Mozilla 1.3 and works with almost perfect accuracy after you give it a few
dozen messages to build up the good and bad database with?
Actually, I find that 1.3 and 1.4 still have issues with determining 
spam. While fairly decent, one still has to go through looking for false 
positives. The other issue is that spammers have been doing a good job 
at designing emails to fool filters. I'm starting to see more and more 
spam designed to defeat Baynesian filters. By including good words in 
their emails, they either make good words spammy so that you get more 
FP's or they make their email clean enough that it's still in your 
inbox. The worst part of it is that spam is quickly becoming unreadable, 
so that legitimate emails that are readable are the emails more likely 
filtered.

-Jack



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Paul Timmins

On Wed, 2003-06-18 at 17:09, Jack Bates wrote:
  The worst part of it is that spam is quickly becoming unreadable, 
 so that legitimate emails that are readable are the emails more likely 
 filtered.
 
 -Jack

On the upside, this means replacing the spam filter with a spell checker
will move us toward 100% accuracy! :-)
-Paul

-- 
Paul Timmins
[EMAIL PROTECTED] / http://www.timmins.net/
H: 313-586-9514 / C: 248-379-7826 / DC: 130*116*24495
AIM: noweb4u / Callsign: KC8QAY



Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread Petri Helenius

 Actually, I find that 1.3 and 1.4 still have issues with determining
 spam. While fairly decent, one still has to go through looking for false
 positives. The other issue is that spammers have been doing a good job
 at designing emails to fool filters. I'm starting to see more and more
 spam designed to defeat Baynesian filters. By including good words in
 their emails, they either make good words spammy so that you get more
 FP's or they make their email clean enough that it's still in your
 inbox. The worst part of it is that spam is quickly becoming unreadable,
 so that legitimate emails that are readable are the emails more likely
 filtered.

I hope I never get your legitimate email. :) Since about 100 messages I practically
stopped visiting the Junk folder every now and then because no false positives
occurred. Just for the sake of this message, I peeked into the folder and scrolled
trough the last ~300 messages and all spam.

About one in 50 does not get flagged and this stream has already gone through
the basic checks like that sender needs to have a legit domain name and such.

So I´m happy camper and I hope that legislation catches up with spammers
before they figure out a surefire way to defeat Baynesians.

Pete




Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread JC Dill
Jack Bates wrote:

Petri Helenius wrote:
Isn´t highlight and hit delete exactly what has been implemented since
Mozilla 1.3 and works with almost perfect accuracy after you give it a 
few
dozen messages to build up the good and bad database with?

Actually, I find that 1.3 and 1.4 still have issues with determining 
spam. While fairly decent, one still has to go through looking for false 
positives. The other issue is that spammers have been doing a good job 
at designing emails to fool filters. I'm starting to see more and more 
spam designed to defeat Baynesian filters. By including good words in 
their emails, they either make good words spammy so that you get more 
FP's or they make their email clean enough that it's still in your 
inbox. The worst part of it is that spam is quickly becoming unreadable, 
so that legitimate emails that are readable are the emails more likely 
filtered.
I have not found this to be the case.  While I don't manage an abuse
mailbox, I do manage a busy mailing list.  The mailing list address and
administrative addresses have been picked up by spammers and are
probably now on all those millions of email addresses CDs.  The
mailing list address and administrative addresses are also both
regularly forged (used to send spam) so I get all the undeliverable
spams mixed in with all the undeliverable actual list email.
Until I started using the Bayesian filters in Mozilla, weeding thru the
spam to find the actual administrative emails that needed my attention
was a very big chore, and my false positive rate utilizing JHD was
fairly high.  Now Mozilla filters for me, and has a much lower false
positive rate.
Note, I fed Mozilla's Bayesian filters two folders, each containing over
1000 emails, one full of spam and one full of legitimate administrative
email, to train it to learn what was and wasn't acceptable email.  Hand
sorting until I had these two seed folders took a fair bit of time, but
it was clearly worth it!
The Bayesian filters are the main reason I'm using Mozilla.  Eudora does
some things much better than Mozilla, but I can't live without the spam
filters anymore!
jc







Re: OT: question re. the Volume of unwanted email (fwd)

2003-06-18 Thread just me

On Wed, 18 Jun 2003, Miles Fidelman wrote:

  It occurs to me that a lot of people on this list might have that sort of
  quantitative data - so... any comments?

  Regards,

  Miles Fidelman


For my little corner:
http://mrtg.snark.net/spam/

It seems 1:1 is the norm these days, at least at my scale.

matto

[EMAIL PROTECTED]darwin
   Flowers on the razor wire/I know you're here/We are few/And far
   between/I was thinking about her skin/Love is a many splintered
   thing/Don't be afraid now/Just walk on in. #include disclaim.h