Re: Need Volunteers for Ham Trap

2011-02-08 Thread Warren Togami Jr.

On 02/07/2011 05:37 PM, Mahmoud Khonji wrote:

On 01/21/2011 01:06 AM, Warren Togami Jr. wrote:

On 1/20/2011 7:23 AM, R - elists wrote:


initially this came across as a really suspect idea...

i.e., one man's junk is another man's treasure


Ham is a lot easier to define than Spam.  Ham is simply anything that
you subscribed for.



I am currently subscribed to number of mailing lists to collect ham
emails (in addition to other sources). While it might be true that
mailing lists can be good sources of ham, their emails do not contain
realistic diversity of features/characteristics.


I explicitly excluded discussion mailing lists from the ham trap.



In my view, the issue is not just insuring an email is ham, but also
insuring that it contains realistic set of features. If the features are
not realistic, and if we optimize tests scores based on that, then we
might end up worsening test scores for realistic end-users.


Not if it is subscribed to hundreds of opt-in subscriptions for 
legitimate mail that ordinary users receive, most of which is otherwise 
not represented in the corpora.  Many of these subscriptions send mail 
only once a week or month.


It is true that the hamtrap corpus is synthetic and thus not fully 
representative in frequencies of real ham.  But its volume is only a 
tiny fraction of a percent of our total ham.  It helps us to detect and 
fix problems in individual rules by injecting some variety without 
causing a measurable impact on the entire corpus.




For example, most list emails are non-HTML. While most end-user ham and
spam emails are HTML. Evaluating sets of features (or tests) based on
this unrealistic corpus is likely to fools us into thinking that a
feature/test is more effective that what it is in reality (i.e. we might
end up giving MIME-based tests higher scores).


The spec and implementation of this ham trap already took this and many 
other issues into consideration.  We've already had a few experts here 
conclude the plan is sound.


I'm somewhat annoyed by the armchair quarterback negative comments on 
this topic.  (Not just you) didn't read the rest of this thread to 
realize this particular concern is moot.  None of the people complaining 
about how this is such a bad idea are being helpful by actually 
participate in the nightly masscheck.


Talk is cheap.  I'm actually doing something.

Warren


Re: Need Volunteers for Ham Trap

2011-02-08 Thread Daniel McDonald



On 2/8/11 3:15 AM, Warren Togami Jr. wtog...@gmail.com wrote:


 I'm somewhat annoyed by the armchair quarterback negative comments on
 this topic.  (Not just you) didn't read the rest of this thread to
 realize this particular concern is moot.

Ditto.  I don't really have time to participate in this activity, but the
methodology is sound and provides a needed source of ham.  Many people want
these opt-in lists, and I don't want to block them.

 None of the people complaining
 about how this is such a bad idea are being helpful by actually
 participate in the nightly masscheck.

I do participate in masschecks, primarily because I have a lot of mail from
politicians (campaign pieces, updates from my congressman, notes from party
officials, and the like) that was getting flagged as spam even though it is
clearly opt in, and unsubscribing is clear and simple.  The main corpus used
in masschecks is the mail for a bunch of techies, and I had a divergent set
of mail from this other interest in my life.  Warren's project extends that
concept much further than just the side-interests of a couple of us
nerds/wonks.

 
 Talk is cheap.  I'm actually doing something.

Keep it up!

 
 Warren

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281



Re: Need Volunteers for Ham Trap

2011-02-07 Thread Mahmoud Khonji
On 01/21/2011 01:06 AM, Warren Togami Jr. wrote:
 On 1/20/2011 7:23 AM, R - elists wrote:
 
 initially this came across as a really suspect idea...

 i.e., one man's junk is another man's treasure
 
 Ham is a lot easier to define than Spam.  Ham is simply anything that
 you subscribed for.
 

I am currently subscribed to number of mailing lists to collect ham
emails (in addition to other sources). While it might be true that
mailing lists can be good sources of ham, their emails do not contain
realistic diversity of features/characteristics.

In my view, the issue is not just insuring an email is ham, but also
insuring that it contains realistic set of features. If the features are
not realistic, and if we optimize tests scores based on that, then we
might end up worsening test scores for realistic end-users.

For example, most list emails are non-HTML. While most end-user ham and
spam emails are HTML. Evaluating sets of features (or tests) based on
this unrealistic corpus is likely to fools us into thinking that a
feature/test is more effective that what it is in reality (i.e. we might
end up giving MIME-based tests higher scores).


Mahmoud


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-21 Thread Martin Gregorie
On Thu, 2011-01-20 at 21:50 -0800, Jeff Chan wrote:

 Yes and no.  If you sign up for Joe's Bagel Company mailing list
 to find out about the latest Bagel news, and some new marketing
 guy joins the Bagel company and starts sending marketing messages
 about Bananas to that list, then the original purpose of the list
 and what you thought you signed up for has been corrupted.  Most
 people would consider the latter to be spam, and rightly so.
 
That's exactly what I what I was saying about BT. I'm signed up for a
list whose stated purpose is to send billing and service change
notifications about their phone service, but BT's marketing department
suddenly started using it to flog broadband. I think most people would
consider these sales messages to be spam and the BT sales department to
be spammers. 

Under these circumstances I would *not* expect the list to get a
Pure-Ham rating.


Martin




RE: Need Volunteers for Ham Trap

2011-01-20 Thread R - elists

 
 This is a misunderstanding.  I am largely against 
 whitelisting or negative score rules.  I merely intend to 
 increase the variety of legitimate mail in the nightly ham 
 corpus so our spam-hostile rules can be better tested for 
 safety.  This will be interesting especially with non-English ham.
 
 Warren
 

Warren,

so, are you going to keep two or more corpus datasets?

one as it is, and one with the new for comparison?

initially this came across as a really suspect idea... 

i.e., one man's junk is another man's treasure

for a moment, it appeared we were gonna need to review the good and the bad
of spam-l to avoid serious SA list issues.

statistically speaking, this shouldnt sway the scoring substantially anyways
would it?

what should be known so that bad data is not allowed into the HAM corpus ?

 - rh



Re: Need Volunteers for Ham Trap

2011-01-20 Thread Justin Mason
On Tue, Jan 18, 2011 at 12:59, Warren Togami Jr. wtog...@gmail.com wrote:
 On 1/17/2011 11:46 PM, Jeff Chan wrote:

 So a couple points:

 1.  Subscribing to lists opens up lots of grey areas including
 the above.

 2.  Some of the areas are very difficult to resolve into spam or
 ham.  Some more aggressive anti-spammers may say all of the above
 is spam, but others may disagree, and the mail may be legal.

 Before anyone accuses me of being in favor of spammers, please be
 aware that I am personally highly against any of these unethical
 practices, but when essentially making decisions for others, one
 needs to be very careful and consider whether there may be legitimate,
 ethical, legal or even wanted uses of such things.  One person's
 ham may be another persons spam, and vice versa.  However, most
 people don't want the stuff bots send.

 The issue is complex, and there are many deliverability, security
 and anti-spam companies and organizations that struggle with these
 issues every day.  Maintaining accurate ham and spam corpora and
 making policies for what belongs in which category is trivial in
 some easy cases like bot pill spam, but non-trivial in other
 cases.

 Cheers,

 Jeff C.

 I appreciate the nuanced feedback but I have thought of similar
 considerations.  I believe the following will help to avoid ambiguity and
 legal issues.

 * Yes, we cannot be 100% sure our opt-in was only for that particular site
 and not their partners.  But in any case automatic ham trapped mail will
 be only the mail branded by the subscribed provider, because that is the
 only mail we know for sure was opted-in.  Anything else is kept separate for
 later analysis.

 * If clearly spammy other mail arrives at a particular address, the original
 subscription can be unsubscribed and the continued flow monitored.  That
 address could then be discarded.

+1 to those. tagged addressing makes this easy to implement (and track).
I use this approach on a very small scale for a small number of ham newsletters
in my own corpus...

--j.


Re: Need Volunteers for Ham Trap

2011-01-20 Thread Warren Togami Jr.

On 1/20/2011 7:23 AM, R - elists wrote:


initially this came across as a really suspect idea...

i.e., one man's junk is another man's treasure


Ham is a lot easier to define than Spam.  Ham is simply anything that 
you subscribed for.




for a moment, it appeared we were gonna need to review the good and the bad
of spam-l to avoid serious SA list issues.

statistically speaking, this shouldnt sway the scoring substantially anyways
would it?


You are correct.  This is more of a tool to have *some* variety in the 
ham corpus, to make it possible to flag rules in need of scrutiny.  For 
example, prior to 3.3.x many of our rules were utterly broken with 
Japanese mail.  We had no idea of this fact until I added a few thousand 
Japanese mail to the ham corpus.  JM understood the problem and fixed 
those rules.




what should be known so that bad data is not allowed into the HAM corpus ?



The previous discussion described a sort of tagged sender ham trap. 
This simple process automatically excludes extraneous mail in cases 
where the address was shared with affiliates or spammer lists.  We 
also will be careful in sticking to reputable companies and orgs for the 
ham trap.


Warren


What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread David F. Skoll
On Thu, 20 Jan 2011 11:06:31 -1000
Warren Togami Jr. wtog...@gmail.com wrote:

 Ham is a lot easier to define than Spam.  Ham is simply anything that 
 you subscribed for.

Not necessarily.  You could subscribe to a list expecting it to contain
useful content.  A few months later, the organization running the list
might decide to change what it posts and start posting undesired marketing
information on the list.

Is that still ham?

Regards,

David.


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread Bowie Bailey
On 1/20/2011 4:10 PM, David F. Skoll wrote:
 On Thu, 20 Jan 2011 11:06:31 -1000
 Warren Togami Jr. wtog...@gmail.com wrote:

 Ham is a lot easier to define than Spam.  Ham is simply anything that 
 you subscribed for.
 Not necessarily.  You could subscribe to a list expecting it to contain
 useful content.  A few months later, the organization running the list
 might decide to change what it posts and start posting undesired marketing
 information on the list.

 Is that still ham?

Of course it is.  You subscribed to it.  If you don't want it anymore,
unsubscribe.

If you unsubscribe and they keep sending it anyway, THEN it becomes spam.

-- 
Bowie


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread David F. Skoll
On Thu, 20 Jan 2011 16:12:58 -0500
Bowie Bailey bowie_bai...@buc.com wrote:

 Of course it is.  You subscribed to it.  If you don't want it anymore,
 unsubscribe.

I disagree.  When you subscribe to a list, there's an implicit understanding
of the content you are signing up for.  If the list owner violates the rules
and posts marketing material, that's spam.

Concrete example:  If I posted an ad for our commercial anti-spam system
on the MIMEDefang list, that would be spam.  If I posted it on this list,
it would be spam-squared and I'd probably be banned. :)

Regards,

David.


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread Bowie Bailey
On 1/20/2011 4:17 PM, David F. Skoll wrote:
 On Thu, 20 Jan 2011 16:12:58 -0500
 Bowie Bailey bowie_bai...@buc.com wrote:

 Of course it is.  You subscribed to it.  If you don't want it anymore,
 unsubscribe.
 I disagree.  When you subscribe to a list, there's an implicit understanding
 of the content you are signing up for.  If the list owner violates the rules
 and posts marketing material, that's spam.

 Concrete example:  If I posted an ad for our commercial anti-spam system
 on the MIMEDefang list, that would be spam.  If I posted it on this list,
 it would be spam-squared and I'd probably be banned. :)

Public discussion lists are bit different.  In that case, it is the
individual post that is being considered spam rather than considering
the list spammy.  Since there is no overall control over the content of
the posts, public lists are vulnerable to being filled with spam if the
list owners are not paying attention.

When you sign up for a company's email list, you get whatever they
decide to send you.  If they decide to start sending marketing to the
list, I would not consider that spam because they own the list and they
can decide what to use it for.  The recipients signed up to get that
company's emails and if they no longer want to receive them, they can
unsubscribe.  And as I said before, if the unsubscribe function doesn't
work, then the emails become spam (regardless of the actual content).

-- 
Bowie


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread Warren Togami Jr.

On 01/20/2011 11:31 AM, Bowie Bailey wrote:


Public discussion lists are bit different.  In that case, it is the
individual post that is being considered spam rather than considering
the list spammy.  Since there is no overall control over the content of
the posts, public lists are vulnerable to being filled with spam if the
list owners are not paying attention.


For this reason, the ham trap will not be subscribed to any discussion 
lists.




When you sign up for a company's email list, you get whatever they
decide to send you.  If they decide to start sending marketing to the
list, I would not consider that spam because they own the list and they
can decide what to use it for.  The recipients signed up to get that
company's emails and if they no longer want to receive them, they can
unsubscribe.  And as I said before, if the unsubscribe function doesn't
work, then the emails become spam (regardless of the actual content).



Your understanding is exactly correct.

Warren


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread David F. Skoll
On Thu, 20 Jan 2011 16:31:50 -0500
Bowie Bailey bowie_bai...@buc.com wrote:

 When you sign up for a company's email list, you get whatever they
 decide to send you.

OK.  I guess we'll agree to disagree on our definitions, then.

Regards,

David.


Re: What is Ham? (was Re: Need Volunteers for Ham Trap)

2011-01-20 Thread Jeff Chan
On Thursday, January 20, 2011, 1:31:50 PM, Bowie Bailey wrote:
 On 1/20/2011 4:17 PM, David F. Skoll wrote:

 When you sign up for a company's email list, you get whatever they
 decide to send you.  If they decide to start sending marketing to the
 list, I would not consider that spam because they own the list and they
 can decide what to use it for.  The recipients signed up to get that
 company's emails and if they no longer want to receive them, they can
 unsubscribe.  And as I said before, if the unsubscribe function doesn't
 work, then the emails become spam (regardless of the actual content).

Yes and no.  If you sign up for Joe's Bagel Company mailing list
to find out about the latest Bagel news, and some new marketing
guy joins the Bagel company and starts sending marketing messages
about Bananas to that list, then the original purpose of the list
and what you thought you signed up for has been corrupted.  Most
people would consider the latter to be spam, and rightly so.

OTOH if the Bagel company decides to send non-Bagel messages to a
Bagel specific list, then one knows exactly:

1.  Who to blame
2.  Where to unsubscribe
3.  What went wrong
etc.

So at least there is a responsible party to hopefully act on
unsubscriptions, fire the spammy marketer, etc.  It's sort of a
degenerate case of the degenerate case of email addresses going
to to a third party, except it's the same party.

Spam is easy.  Ham is hard.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:je...@surbl.org
http://www.surbl.org/



Re: Need Volunteers for Ham Trap

2011-01-19 Thread Jeff Chan
On Tuesday, January 18, 2011, 4:59:05 AM, Warren Jr. wrote:

 * Yes, we cannot be 100% sure our opt-in was only for that particular
 site and not their partners.  But in any case automatic ham trapped
 mail will be only the mail branded by the subscribed provider, because
 that is the only mail we know for sure was opted-in.  Anything else is
 kept separate for later analysis.

 * If clearly spammy other mail arrives at a particular address, the
 original subscription can be unsubscribed and the continued flow 
 monitored.  That address could then be discarded.

Both seem reasonable approaches.

Those degenerate cases of both are indeed interesting.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:je...@surbl.org
http://www.surbl.org/



Re: Need Volunteers for Ham Trap

2011-01-19 Thread Warren Togami Jr.

On 01/18/2011 11:49 PM, Jeff Chan wrote:

On Tuesday, January 18, 2011, 4:59:05 AM, Warren Jr. wrote:


* Yes, we cannot be 100% sure our opt-in was only for that particular
site and not their partners.  But in any case automatic ham trapped
mail will be only the mail branded by the subscribed provider, because
that is the only mail we know for sure was opted-in.  Anything else is
kept separate for later analysis.



* If clearly spammy other mail arrives at a particular address, the
original subscription can be unsubscribed and the continued flow
monitored.  That address could then be discarded.


Both seem reasonable approaches.

Those degenerate cases of both are indeed interesting.

Cheers,

Jeff C.


Yes, I think this is a reasonably simple and effective plan.  I only 
need volunteers to help me find appropriate sites and to help subscribe. 
 It is very boring to do all this myself.


Warren


Re: Need Volunteers for Ham Trap

2011-01-18 Thread Jeff Chan
On Monday, January 17, 2011, 10:52:58 PM, Warren Jr. wrote:
 Hi folks,

 Here is an opportunity for non-developers to do simple tasks to help
 improve Spamassassin.

 I am seeking volunteers to help me build and administrate a ham trap.
   The idea is to subscribe a list of unique e-mail addresses to various
 retailers, airlines, government and other legitimate bulk mail senders.
   A sufficient variety of ham trap subscriptions should increase the
 variety of legitimate senders represented in nightly masscheck and thus
 improve the safety of Spamassassin's rules.

 Benefits of the Ham Trap
 
 * Creation of an automated, synthetic source to build a corpus of very
 recent ham for the nightly masscheck.  Ham trap data will be expired
 from the masscheck after 3 months.  This will be fairly easy to maintain
 in a 99% automated fashion, ensuring a constant stream of fresh data for
 the nightly masscheck largely without the need for human sorting.

 * Help to identify legitimate bulk senders who are performing poorly
 with spamassassin.  Our data may help legitimate senders to modify their
 mail practices to avoid spamminess.

 * Each subscription is a unique tracked address.  This will make it
 possible to definitively identify bulk senders who violate their 
 customer's privacy by selling their e-mail address list to others.
 There isn't much we can do about these cases other than shame them on a
 web page, but for spam fighters this is useful information.

While I certainly would encourage improving ham and spam corpora,
this proposal may open up a lot of grey areas that may be
non-trivial to resolve.  

Some of the legitimate mailing lists that sell, share or rent
their addresses to third party senders may be doing so legally if
it's permitted in the terms of use one agrees to when signing up.
Obviously such a practice is questionable at best in terms of
ethics, but it may be technically legal.  There are also
affiliate marketing programs explicitly based on sharing opt
in lists which may be even less ethical and apparently have many
abusers.  Such things may be legal while being unethical.

So a couple points:

1.  Subscribing to lists opens up lots of grey areas including
the above.

2.  Some of the areas are very difficult to resolve into spam or
ham.  Some more aggressive anti-spammers may say all of the above
is spam, but others may disagree, and the mail may be legal.

Before anyone accuses me of being in favor of spammers, please be
aware that I am personally highly against any of these unethical
practices, but when essentially making decisions for others, one 
needs to be very careful and consider whether there may be legitimate,
ethical, legal or even wanted uses of such things.  One person's
ham may be another persons spam, and vice versa.  However, most
people don't want the stuff bots send.

The issue is complex, and there are many deliverability, security
and anti-spam companies and organizations that struggle with these
issues every day.  Maintaining accurate ham and spam corpora and
making policies for what belongs in which category is trivial in
some easy cases like bot pill spam, but non-trivial in other
cases.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:je...@surbl.org
http://www.surbl.org/



Re: Need Volunteers for Ham Trap

2011-01-18 Thread Martin Gregorie
On Tue, 2011-01-18 at 01:46 -0800, Jeff Chan wrote:

 While I certainly would encourage improving ham and spam corpora,
 this proposal may open up a lot of grey areas that may be
 non-trivial to resolve.  
 
Agreed, and some companies will get to you sign up for accounting and
service problem notifications and then pump advertising down the channel
in such volume that the purpose for which you signed up seems utterly
forgotten. 

British Telecom sets a bad example here: they even behave like a spammer
inasmuch as they regularly vary their promotions text to dodge spam
filters. I'd be worried that if word gets around that SA is developing
rules that give signed-up bulk mail a free ride then a lot more
companies will do the same.


Martin




Re: Need Volunteers for Ham Trap

2011-01-18 Thread John Wilcock

Le 18/01/2011 10:46, Jeff Chan a écrit :

2.  Some of the areas are very difficult to resolve into spam or
ham.  Some more aggressive anti-spammers may say all of the above
is spam, but others may disagree, and the mail may be legal.


I'd suggest that SA ought to be classifying e-mail in *three* broad 
categories, not two.


Firstly, definite spam, unsolicited in any way.

Secondly, definite ham (i.e. primarily genuine person-to-person e-mail 
and actively solicited messages such as confirmations of website 
transactions), which even the most aggressive spam-fighters would agree 
is ham. FPs in this category are bad news.


And thirdly, an in-between category, of which opt-in advertising is a 
prime example, which at least some users are happy to receive, but where 
FPs aren't a major problem.


With a few relatively rare exceptions, SA already classifies these 
categories pretty effectively, especially with a well-trained bayesian 
db. Genuine ham tends to come in with negative scores, occasionally 
straying up to about 1 or 2. Likewise, undisputed spam rarely scores 
less than 8 or 10. And opt-in advertising typically comes in with 
neutral scores of 0 to 4. So far, so good.


Using this opt-in advertising, which IMO ought to be getting neutral 
scores, as a ham corpus, is inevitably going to be problematic. Using it 
as a third, neutral corpus that is given far less weight than genuine 
ham would be a different matter, but would require a major change in the 
the scoring algorithms.


John.

--
-- Over 4000 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages- www.tradoc.fr


Re: Need Volunteers for Ham Trap

2011-01-18 Thread Warren Togami Jr.

On 1/17/2011 11:46 PM, Jeff Chan wrote:


So a couple points:

1.  Subscribing to lists opens up lots of grey areas including
the above.

2.  Some of the areas are very difficult to resolve into spam or
ham.  Some more aggressive anti-spammers may say all of the above
is spam, but others may disagree, and the mail may be legal.

Before anyone accuses me of being in favor of spammers, please be
aware that I am personally highly against any of these unethical
practices, but when essentially making decisions for others, one
needs to be very careful and consider whether there may be legitimate,
ethical, legal or even wanted uses of such things.  One person's
ham may be another persons spam, and vice versa.  However, most
people don't want the stuff bots send.

The issue is complex, and there are many deliverability, security
and anti-spam companies and organizations that struggle with these
issues every day.  Maintaining accurate ham and spam corpora and
making policies for what belongs in which category is trivial in
some easy cases like bot pill spam, but non-trivial in other
cases.

Cheers,

Jeff C.


I appreciate the nuanced feedback but I have thought of similar 
considerations.  I believe the following will help to avoid ambiguity 
and legal issues.


* Yes, we cannot be 100% sure our opt-in was only for that particular 
site and not their partners.  But in any case automatic ham trapped 
mail will be only the mail branded by the subscribed provider, because 
that is the only mail we know for sure was opted-in.  Anything else is 
kept separate for later analysis.


* If clearly spammy other mail arrives at a particular address, the 
original subscription can be unsubscribed and the continued flow 
monitored.  That address could then be discarded.


Warren


Re: Need Volunteers for Ham Trap

2011-01-18 Thread Warren Togami Jr.

On 1/18/2011 1:15 AM, Martin Gregorie wrote:

On Tue, 2011-01-18 at 01:46 -0800, Jeff Chan wrote:


While I certainly would encourage improving ham and spam corpora,
this proposal may open up a lot of grey areas that may be
non-trivial to resolve.


Agreed, and some companies will get to you sign up for accounting and
service problem notifications and then pump advertising down the channel
in such volume that the purpose for which you signed up seems utterly
forgotten.

British Telecom sets a bad example here: they even behave like a spammer
inasmuch as they regularly vary their promotions text to dodge spam
filters. I'd be worried that if word gets around that SA is developing
rules that give signed-up bulk mail a free ride then a lot more
companies will do the same.


This is a misunderstanding.  I am largely against whitelisting or 
negative score rules.  I merely intend to increase the variety of 
legitimate mail in the nightly ham corpus so our spam-hostile rules can 
be better tested for safety.  This will be interesting especially with 
non-English ham.


Warren


Re: Need Volunteers for Ham Trap

2011-01-18 Thread Dave Pooser
On 1/18/11 12:52 AM, Warren Togami Jr. wtog...@gmail.com wrote:

 I am seeking volunteers to help me build and administrate a ham trap.
   The idea is to subscribe a list of unique e-mail addresses to various
 retailers, airlines, government and other legitimate bulk mail senders.

The possible fly in the ointment I see is that you wouldn't necessarily have
access to some sorts of transactional emails-- airline flight reminders and
things of that nature. Would that be something where you'd be interested in
getting mail cc:ed to a hamtrap address? For example, I use tagged email
addresses for different airlines, and it would be trivial for me to have my
server relay those messages to a hamtrap address as well as delivering to my
personal email if that sort of thing would be useful.
-- 
Dave Pooser
Cat-Herder-in-Chief
Pooserville.com




Re: Need Volunteers for Ham Trap

2011-01-18 Thread Warren Togami Jr.

On 01/18/2011 03:25 PM, Dave Pooser wrote:

On 1/18/11 12:52 AM, Warren Togami Jr.wtog...@gmail.com  wrote:


I am seeking volunteers to help me build and administrate a ham trap.
   The idea is to subscribe a list of unique e-mail addresses to various
retailers, airlines, government and other legitimate bulk mail senders.


The possible fly in the ointment I see is that you wouldn't necessarily have
access to some sorts of transactional emails-- airline flight reminders and
things of that nature. Would that be something where you'd be interested in
getting mail cc:ed to a hamtrap address? For example, I use tagged email
addresses for different airlines, and it would be trivial for me to have my
server relay those messages to a hamtrap address as well as delivering to my
personal email if that sort of thing would be useful.


You are correct that this isn't transactional mail.  It is however 
low-effort automatic collection of a subset of ham that real users 
receive, much of which we are entirely missing from the nightly corpus.


https://fedorahosted.org/auto-mass-check/
As for the ham you suggest, I highly suggest running your own nightly 
masscheck and uploading logs.  This avoids privacy problems and allows 
you to check/correct quality issues in your own corpus.


Warren