Re: [Gossip] status report + look and feel questions

2004-11-25 Thread Fred H Olson
On Wed, 24 Nov 2004, Earl Hood wrote:

> > On my lists I still find that requiring posts to come from subscribed
> > addresses keeps virtually all spam from being distributed. I've had
> > very few if any instances of spammers subscribing to a list to spam it.
> > Does archive lists to which anyone can post?
> List administration is handled by the list owners not
> Therefore, if the list owner allows anyone to post to the list, then
> the messages will get archived (unless spam filters
> believe such messages are spam).

I would think that mail-archive was enough of a useful service that it
could put requirements or at least strongly urge listowners to adopt
reasonable policies.  For example, surely you would not archive a list
that encourages spam would it. [I just realized I should look at your
site's policies - laterI guess.]

Enforcing such would be difficult but it would give you a tool to deal
with the worst cases and may influence some list owners just by being

It would be pretty easy to test lists for the subscriber only posting:

   This is a test messagebeing sent from a special address or to all
   lists archived at...

> > With my browser (Mozilla 1.4.1) the ads occasionally prevet the last few
> > characters of a message line from being displayed. Example, in:
> >
> > The end of the third line on my display reads
> What operating system?  Message looks fine to me, but I'm using a
> later version of Mozilla.

Linux, Fedora Core 1, using Gnome

On Wed, 24 Nov 2004, Jeff Breidenbach wrote:

> Fred, thanks for the feedback. Keep it coming if you have more.

Sorry if I came across as overly critical; mail-archive is a big
improvement for may lists. 

> Does the problem fix itself if you make the broswer wider?  Can I get
> a screenshot? Send it to my gmail account please <[EMAIL PROTECTED]>

I'll try to get to this after Thanksgiving - gotta run now.

> Sorry - the name "gossip" was the first thing I thought of in
> 1998. I've changed the footer as you suggested. 

Thanks.  Someday I'll learn procmail so I can manipulate such things to my


Fred H. Olson  Minneapolis,MN 55411  USA(near north Mpls)
Communications for Justice - My new listserv org.   UU, Linux
My Link Page:   Ham radio:WB0YQM
fholson at   612-588-9532   (7am-10pm Central time)

Discussion list for The Mail Archive

Re: [Gossip] status report + look and feel questions

2004-11-24 Thread Jeff Breidenbach

Fred, thanks for the feedback. Keep it coming if you have more.

>Does archive lists to which anyone can post?

Because the service is so big, we probably have the full range. Lists
without spam problems, lists with spam problems, and (unfortunately)
lists that ARE spam problems. The main reason for basic spam filtering
on the archival inbox is to not waste resources archiving spam. That
and the fact that we detest spammers.

>the ads occasionally prevet the last few characters of a message

That may be something we can address. 

Does the problem fix itself if you make the broswer wider?  Can I get
a screenshot? Send it to my gmail account please <[EMAIL PROTECTED]>

>I did see one ad for [junk]

Yeah, at least to my eye the Google ads seem to be taking a decline in
quality - with far too many cases either being irrelevant or
sketchy. Since we need the ads to cover costs, I'm not really sure
what to do about it. But I agree that it is a growing concern and
I would like to find a way to address the problem.

>The list name link in the upper left corner of a message page and of
>index pages bring up an index page.  Such a link on index pages is
>pretty useless [...]

Completely agree. We should either try to make that link more useful,
or (perhaps more likely) unlinkify the listname on index pages.

>How about changing the tag to be something like "mailarc" and putting
>something in message footers like "Mail Archive talk"

Sorry - the name "gossip" was the first thing I thought of in
1998. I've changed the footer as you suggested. I'm leaving the tag as
[gossip] to match the listname, but in the long long term will think
about wether or not to change the name of the list.

Discussion list for The Mail Archive

Re: [Gossip] status report + look and feel questions

2004-11-24 Thread Earl Hood
On November 24, 2004 at 10:29, Fred H Olson wrote:

> On my lists I still find that requiring posts to come from subscribed
> addresses keeps virtually all spam from being distributed. I've had
> very few if any instances of spammers subscribing to a list to spam it.
> Does archive lists to which anyone can post?

List administration is handled by the list owners not
Therefore, if the list owner allows anyone to post to the list, then
the messages will get archived (unless spam filters
believe such messages are spam).

> As one last precaution I have new subscribers first messages moderated
> (sent to the reject page) so I'd catch a subscribed spammer's first
> message.  This has the added advantage of catching some "please
> unsubscribe me" messages from people who never post anything else.

Something that may be good to do for list administrators.
does not perform any list administration functions.

> -- Advertising on --
> Regretable that you have to have it but it's more tolerable than yahoo's.
> With my browser (Mozilla 1.4.1) the ads occasionally prevet the last few
> characters of a message line from being displayed. Example, in:
> The end of the third line on my display reads

What operating system?  Message looks fine to me, but I'm using a
later version of Mozilla.

> The list name link in the upper left corner of a message page and of index
> pages bring up an index page.  Such a link on index pages is pretty
> useless, it would be much better to link to the lists "info page" (I think
> all lists should and most do have these) which in turn has description of
> list, subscription info etc. Are there links somewhere to contact info
> for archived lists? is as automated as possible, including the detection
of new lists to archive.  Helps keep operational costs down.  Right
now, there are no facilities for list administrators to register
list info, and such capabilities would require human-based review
for content.

I believe the folks at have considered additional
features similiar to this, but such things will probably not get added
unless it can be automated and done in a secure fashion.


Gossip mailing list

Re: [Gossip] status report + look and feel questions

2004-11-24 Thread Fred H Olson
On Mon, 22 Nov 2004, Jeff Breidenbach wrote
wrt :

> Recent changes have been mostly behind the scenes. Here's some of
> the highlights that haven't been mentioned yet on gossip:
>   a) Improved spam filtering on the archives. Unfortunately there's
>  so much junk flying around the internet that we had to get 
>  a little more serious at filtering the archival inbox.

On my lists I still find that requiring posts to come from subscribed
addresses keeps virtually all spam from being distributed. I've had
very few if any instances of spammers subscribing to a list to spam it.
Does archive lists to which anyone can post?
The ISP that hosts my lists has filtering (greylisting) that keeps most
spam to my big list ** from getting thru to where I have to look at it
on the reject page (where messages from non subscribed addresses go).

As one last precaution I have new subscribers first messages moderated
(sent to the reject page) so I'd catch a subscribed spammer's first
message.  This has the added advantage of catching some "please
unsubscribe me" messages from people who never post anything else.

** (~150 msg/month;  ~500 subs -
admittedly higher than average on civility scale due topic )

-- Advertising on --
Regretable that you have to have it but it's more tolerable than yahoo's.
With my browser (Mozilla 1.4.1) the ads occasionally prevet the last few
characters of a message line from being displayed. Example, in:
The end of the third line on my display reads
   Gurban to be described as "not as qualifie
Curiously when I copied and pasted the line into this message
the d" were there...

I did see one ad for what appeared to be lists "b2b" addresses to be
solicited (spammed) - sorry I did not keep track of specifics.

-- misc --

The list name link in the upper left corner of a message page and of index
pages bring up an index page.  Such a link on index pages is pretty
useless, it would be much better to link to the lists "info page" (I think
all lists should and most do have these) which in turn has description of
list, subscription info etc. Are there links somewhere to contact info
for archived lists?

Lastly why is does this list have the meaningless name and subject line
tag "gossip"? How about changing the tag to be something like "mailarc"
and putting something in message footers like "Mail Archive talk"


Fred H. Olson  Minneapolis,MN 55411  USA(near north Mpls)
Communications for Justice - My new listserv org.   UU, Linux
My Link Page:   Ham radio:WB0YQM
fholson at   612-588-9532   (7am-10pm Central time)

Gossip mailing list

[Gossip] status report + look and feel questions

2004-11-22 Thread Jeff Breidenbach

Recent changes have been mostly behind the scenes. Here's some of
the highlights that haven't been mentioned yet on gossip:

  a) Improved spam filtering on the archives. Unfortunately there's
 so much junk flying around the internet that we had to get 
 a little more serious at filtering the archival inbox.

  b) YahooGroups lists are no longer banned. This is in part because
 we now have better processing capacity, and in part because I
 expect fewer YahooGroup related headaches.
  c) Improved network infrastructure. This is mostly behind the scenes
 in terms of number of redundant mail servers, management of dns
 servers, network monitoring, backups, etc. You probably won't
 notice anything but it makes Jeff and my lives a little easier.

Also now that people have had a chance to play with the new look 
and feel for a few weeks, I'd like to get some comments.

 - Do the page load fast enough for everyone?

 - Anyone unhappy due to browser compatibility problems?

 - Is anyone actually honest-to-god using the date navigation 
   links at the bottom of message pages now that we have keyboard 
   shortcuts? Speak up or my minimalist sensibilities will take over
   and I get rid of them.

 - Does anyone prefer the old layout better?


Gossip mailing list

[Gossip] status

2004-05-10 Thread Jeff Breidenbach

Archiving system is still offline, and the queue size is 815 MB at the
moment. Made some progress this weekend in behind the scenes data
copying.  When we are up and processing (this coming week
sometime!?! it will be out-of-order. Newest first, while feeding from
the backlog slowly.


Gossip mailing list

[Gossip] status, right now

2002-10-23 Thread Jeff Breidenbach

Ok -- 

Zamboni (the old server) will never update again. Poet (the new
server) is processing current mail and also getting some older archive
messages recopied into it as we speak. You'll know things are done when and resolve to the
same machine.

Poet's mail exchanger (MX) has stronger spam filtering and should be
becoming active as DNS changes propogate, so let me know if any of you
experience unusual bounces.


Gossip mailing list

Re: [Gossip] status, mail-archive

2002-09-23 Thread Kir Kolyshkin

Jeff Breidenbach wrote:

>  5) Regarding the common "phrase search" feature request, it looks
> like htdig 3.2 is nowhere near ready to go, so that's not
> happening any time soon.

Again, what about giving ASPseek a try? I'm one of developers ;)

Guinness a Day Keeps a Doctor Away (people's wisdom)

Gossip mailing list

[Gossip] status, mail-archive

2002-09-22 Thread Jeff Breidenbach

Lots of news on next generation system:

 1) Configuration work {exim, htdig, analog, bind, mailmen ...}  is
done. System is running great in shadow mode.  Many thanks to my

 2) TODO: insertion into final network and switchover.

 3) HTML page count is about 10 million; I'm consulting with the
reiserfs team on how to get really fast file counts. The 10
million number is from precise but slow measurement; currently
estimating deltas based on df.

 4) Exim will use sender_verify_hosts_callback on switchover.  Rumored
to be a good spam shield, but we may have some false positives.
We'll see.

 5) Regarding the common "phrase search" feature request, it looks
like htdig 3.2 is nowhere near ready to go, so that's not
happening any time soon.

 6) Regarding the common "address hide" feature request, I've 
bumped up the obfuscation slightly by custom hacking MHonArc.
Do people prefer that I mangle email addresses in message 
bodies to something completely indecipherable like GeoCrawler's 


Gossip mailing list

[Gossip] status

2002-09-07 Thread Jeff Breidenbach

Hi all,

Next generation hardware finally arrived, and is 
looking very good. So far, I'm very happy with the
vendor who seems to have done a great job testing.
Data transfer will begin shortly, not sure how long 
it will take. Expect periodic queuing of mail during 
this time (for example, mail-archive will in queuing 
mode for all of tonight)


Gossip mailing list

Re: [Gossip] status

2002-03-25 Thread sixtwo rsrd

hi jeff
im the list manager for

we've moved the archives to our own site, and no longer need them in the
mail-archive. ive unsubbed archive@jab and you may delete the folders at
your leisure from your site if you wish.

thanks for the service. its a pleasure to get something for nothing in this
overly capatalistic world.
Shawn Bega
please ride safely

Gossip mailing list

[Gossip] status

2002-02-26 Thread Jeff Breidenbach

Hi all,

I am now back from a long vacation (which included chatting with
riot police in Salt Lake City a few days ago!) and have tried to
work my way through various mail-archive questions, comments, etc.
I think I've caught up at this point, but so if some issue is not
taken care of, bring it up again.

On the front burner:

  * Gossip subscription needs to be fixed. No idea what is wrong
or how to proceed, but this is problematic.

  * I'm increasingly getting (a) spam (b) requests for spamblocks For
example, I think my recent signal to spam ratio in personal mail
was 6:80 when I got back. Thus I'm finally going to admit defeat
and do address obfustication. 

Surprisingly, apache doesn't seem to have an obfusticate-address
module, so I'm probably going to put the obfusticator in MHonArc
and permanently burn the obfuscation into the HTML.

Anyway, that's the news.


Gossip mailing list

[Gossip] Status of archives?

2001-11-09 Thread Les Schaffer

In checking this morning on a mail list i help administer which is
archived at mail-archive, i noticed a slew of problems. am wondering
whats up with the status of the archives overall health.

in particular, things i noticed:

1.) no new gossip messages since 10/29. perhaps this is combo of delay
in archiving plus low posting rate.

2.) when i search the archive i administer

the main index shows no new posts since Oct 31. Is the archiving now
delayed by 5 days?

3.) When i look at the date index for the archive:

the output page seems very broken. the last post shown is 10/17, and
after the index reaches back to 09/30, the listing repeats (many)
multiple times.

4.) When i search the archive for marxism list using my name as search
keys, i get only about 24 posts, when i know there are many more than
that in the archives. when i earlier checked the same search i got 19
posts, so the number changed within a half hour of checking. 

when i searched using the name of the list moderator, i found only
about 34 posts, and he truly should have hundreds of posts in the

so something seems amiss.


les schaffer

Gossip mailing list

[Gossip] status: users, attachments, bounces, spam, disk

2001-05-28 Thread Jeff Breidenbach


I'm very pleased to see organizations like The Apache Group using
mail-archive to help futher their projects. It's an honor.


With MHonArc 2.4.8 +
I still am not successfully blocking .jpg attachments. Not sure if I
made a mistake or if something isn't working. Configuration follows.


text/plain; maxwidth=87 asis=windows-1252:iso-8859-15
m2h_external::filter; excludeexts="src,vbs,jpg,JPG,jpeg,JPEG" usename useicon subdir 


I'm bouncing (not dropping) all incoming messages over 100KB.  This
was a conscious decision to actively discourage large messages and
the lists that carry them. Working very well.


My personal inbox spam continues to increase. The telltale addresses
(including [EMAIL PROTECTED]) don't seem to get corresponding spams. So
it doesn't _appear_ to be from spambots crawling mail-archive, but I'm
keeping wary. We're a pretty sizable target.


Disk continues to fill despite software tweaks. I've asked VA about
the possibility of loaning or donating a large disk array. If that
doesn't pan out, I will approach other storage venders as well. I
think there is a reasonable chance a company might be willing to
donate excess inventory, especially if they can get a tax writeoff.

By the way, the message count on the homepage is bogus -
it's just an linear estimator tied to df, and doesn't take into
account any of the space efficency tweaks of the last year or so. I
guesstimate mail-archive holds somewhere on the order of 10+ million
emails right now. I don't have a compuitationally cheap way to get 
real filecount.

jeff@zamboni:~$ df
Filesystem   1k-blocks  Used Available Use% Mounted on
/dev/rd/c0d0p3  482093196255260938  43% /
/dev/rd/c0d0p5  964476496200419280  54% /usr
/dev/rd/c0d0p6  964476667964247516  73% /var
/dev/rd/c0d0p1   23300  3715 18382  17% /boot
/dev/rd/c0d0p7   189746780 170726132   9382052  95% /data

Gossip mailing list

Re: [Gossip] status, mail-archive

2000-10-30 Thread Jeff Breidenbach

>This is then put in a  tag at the top of the page like this

Thanks, went ahead and implemented this solution. All non-iso-8869-1
localizations will now have a  tag denoting the character set.
Polish is the first such localization I've encountered. :)

Ok, now I'm going to grumble a bit:

  * it's the 21st century and we still don't have unicode everywhere

  * it's the 21st century and we still don't have email
headers specifying language, making automatic localization
of email archives highly improbable.

On the bright side, the two extra localizations are now live. German
and Polish users can now submit names of lists that should be

As for the monthly granularity with MHonArc.

  Immediate speedup of MHonArc
  Already implemented on per-list basis
  Secondary benefits from increased disk caching due to lower
  memory use by MHonArc.

  Some contortion required if I don't want to break existing URLs
  Adds human interface complexity
  Adds program level complexity
  Lose any lobbying power I might have had towards increasing
MHonArc scalability through windowing
  It feels kludgy and requires me to reverse years of stubborness. :)
The other obvious software wins include ignoring "cold" lists better, and
switching to ReiserFS (although maybe that's not so easy and I should
wait for either the next disk upgrade, and/or perhaps a 2.4.x kernel that
incorporates it natively)


$ uptime
  7:57pm  up 224 days,  3:03,  2 users,  load average: 1.22, 1.26, 1.27

Gossip mailing list

Re: [Gossip] status, mail-archive

2000-10-30 Thread Earl Hood

On October 29, 2000 at 12:28, Jeff Breidenbach wrote:

>Bottom line is I need to put another round of attention into
>software efficiency, relatively soon.

If you break up a list into a set of archives (broken down by month),
efficiency should no longer be a problem.

I know you have had some reluctance about this, mainly for low traffic
lists, but I believe in general it will be the best.

An idea is to do the regular monthly-based archives for all lists,
but have a complimentary "latest messages" archive for the last X
number of messages.  This way the monthly archives definitely serve
as a more "archiving" function while the latest messages archive
is geared to more current discussions and serves like a newsgroup
where older message eventually expire (but are still present in the
monthly archives).


Gossip mailing list

Re: [Gossip] status, mail-archive

2000-10-30 Thread Stephen Turner

On Sun, 29 Oct 2000, Jeff Breidenbach wrote:
>I'm a bit concerned with the character sets -- the provided
>translations don't use HTML escape characters, like é.
>The polish translation appears to use the iso-8859-2 character set,
>while the German translation seems to be in straight ASCII --
>I wonder if that is actually ok? I'd prefer using HTML escape 
>characters, but I'm not sure I have the ability / knowledge to go 
>ahead and put them into the translations. Any help is appreciated.

Nit pick: é not é -- this is a common HTML bug. Unfortunately,
MSIE renders the latter as an e-acute too, so people who only test their
pages on MSIE never realise that it will render in an unintended way on any
conformant browser.

Anyway, Polish can't use the &; things because there isn't any
&lslashed; etc. So what I do with analog is have one extra field at the top
of the language file declaring the character set. This is then put in a
 tag at the top of the page like this:


Statystyki WWW

Statystyki WWW

Program uruchomiony: Pon, 30 Nie 2000 13:54.

You might find this idea useful.

Stephen Turner
  Statistical Laboratory, Wilberforce Road, Cambridge, CB3 0WB, England
  "The new operating system will recover more easily from system crashes."
  (Microsoft, aiming high with Windows Millennium)

Gossip mailing list

[Gossip] status, mail-archive

2000-10-29 Thread Jeff Breidenbach

Hi all,

Here's what's up with mail-archive.

1) I've received localizations for German and Polish
   and will put them into the next point release, maybe

   I'm a bit concerned with the character sets -- the provided
   translations don't use HTML escape characters, like é.
   The polish translation appears to use the iso-8859-2 character set,
   while the German translation seems to be in straight ASCII --
   I wonder if that is actually ok? I'd prefer using HTML escape 
   characters, but I'm not sure I have the ability / knowledge to go 
   ahead and put them into the translations. Any help is appreciated.

2) We're getting too big again. How do I know? People are complaining
   about archiving latency. I also learned that grep (which is used
   inside the guts, specifically grep -F) doesn't like receiving more
   than 5000 arguments. Note that mail-archive very recently exceeded
   5000 lists :)

   Bottom line is I need to put another round of attention into
   software efficiency, relatively soon.


Gossip mailing list