Re: [Gossip] Porting digested new list archives to mail-archive

2015-04-14 Thread Earl Hood
On Mon, Apr 13, 2015 at 11:19 AM, Matt Morgan wrote:

 2. Now the harder one. From Sep 1994 (inception) to Apr 2006, the lists
 were hosted using L-Soft's LISTSERV software, which did not keep archives.
 However, I have a complete set of all traffic from that time period, but
 they are all in Daily Digest format, i.e., with a Table of Contents in the
 front and several emails afterwards. I have MOST (but not all) of these
 available as MIME digests with each message in a different MIME multipart
 segment. I also have ALL of them available as a non-MIME digest, with a
 fixed text separator (like a row of ) between messages. I would propose
 to send these as an mbox format of digest files but each email in each
 digest message would still need to be separated out. (a) Can mail-archive do
 this digest parsing, or do I need to find or write a script to do this
 myself? (b) If mail-archive can do it, do you have a preference for MIME vs.
 non-MIME digest? (c) And if MIME, can you handle the few for which I only
 have non-MIME digests?

 I can't help with this one; skipping.

For MIME digest messages, MUAs like nmh are able to extract such
messages out into individual files, which can be subsequently packed
into mbox format.

If you have mhonarc installed, you can use the mha-decode with the
-dcd-digest option to have all digest messages extracted into separate
files.

--ewh

___
Gossip mailing list
https://www.mail-archive.com/gossip@mail-archive.com
https://www.mail-archive.com/cgi-bin/mailman/options/gossip


Re: [Gossip] search going ok

2006-09-27 Thread Earl Hood
On September 27, 2006 at 17:44, Olly Betts wrote:

 On Mon, Sep 25, 2006 at 03:16:06PM -0700, Jeff Marshall wrote:
  In major search engines like Google and Yahoo, this relevance ordering 
  makes the most sense.
 
 Google Groups and Google News both offer a choice of date or
 relevance (date being most recent first).

In my experience, I have found date-based ordering useful when
searching mail.

 Incidentally, Gmane's search offers both these and also reverse date,
 which shows the oldest matches first - not so useful in general, but
 people requested it and it's easy to do once you've implemented sorting
 by date.  It lets people find their first post if nothing else!

Reverse date sorting is useful, and I use it on occassion on the
various archives I have maintained and search.

I do think such a facility may be useful for a certain-type of user,
but chronology can be considered an intrinsic property of mail/news
archives.

--ewh
-- 
Earl Hood, [EMAIL PROTECTED]
Web: http://www.earlhood.com/
PGP Public Key: http://www.earlhood.com/gpgpubkey.txt

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] The Great UTF-8 SWITCHEROO

2005-06-29 Thread Earl Hood
On June 29, 2005 at 23:19, Jeff Breidenbach wrote:

 I've seen a small but not tiny number of messages where the
 Mail User Agent is sticking raw iso-8859-1 characters (outside
 the ASCII range) inside the Subject: header. And not using
 an RFC 2047 encoding. Our software is barfing on those
 characters when we convert to UTF-8, for example:
 
 http://www.mail-archive.com/brygforum@haandbryg.dk/msg09684.html

They are illegal.  Only ASCII characters are allowed (hence
the reason for non-ASCII encoding in the MIME specs).

However, some locales bend the rules.

With mhonarc, you can try the following trick:

CharsetAliases
iso-8859-1; plain
/CharsetAliases

plain is the special charset name for characters in message header
fields that are not part of a non-ASCII encoded string.  By default,
mhonarc treats plain as us-ascii, but you can use the above
resource to change this.

Wrt to TEXTENCODE, this will cause the plain text to be considered
as iso-8859-1 text for purposes of encoding (in your case utf-8).

--ewh

P.S. You may want to also look at DEFCHARSET for text message bodies
since the problem you cited can also happen for text bodies.  You
may want to add the following to your mhonarc resource files:

DefCharset
iso-8859-1
/DefCharset


P.S.S.  The above changes will only affect new messages unless you
RECONVERT existing messages.

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] The Great UTF-8 SWITCHEROO

2005-06-24 Thread Earl Hood
On June 24, 2005 at 06:17, Jeff Breidenbach wrote:

 It's been brought to my attention that the Great UTF8 Switcheroo
 on June 19th may have had some side effects. Some lists are 
 showing  some corruption on index pages. Not a complete disaster,
 but fairly annoying. 

Not exactly corruption.  What your are seeing is the raw version
of the subject text.

 For example, on brygforum, things look reasonably ok after June 19th,
 but before then subject lines are misdecoded and show up prepended
 with strings like  =?iso-8859-1?. 

You are right that the transition changes is the triggering factor.

The technical reason is that mhonarc assumes that all non-ASCII
encoded data gets decoded when a message is first read when TEXTENCODE
is enabled.  Therefore, a separate routine is used when converting
resource variables (like $SUBJECT$).

Ideally, TEXTENCODE is enabled when an archive is initially created.
I did not consider the implications when TEXTENCODE is enabled
for existing archives.

It should be technically possible to write a script to update an
existing mhonarc database file so all non-ASCII encoded information
is decoded and converted.  Drop me a note if you are interested.

--ewh

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] Consecutive spaces not displayed in some cases

2005-03-30 Thread Earl Hood
On March 29, 2005 at 13:26, Jeff Breidenbach wrote:

 If someone can provide me a raw email message illustrating the
 problem, I can try an look at it.
 
 I've sent the raw message to Earl. In any case, the trial
 configuration change on the rsync mailing list will continue.

It looks like a bug with mhonarc.  Mozilla renders the message
as expected, while mhonarc makes an error starting at
if { nice +20 ... an the end of the script example.

Jeff, please submit a bug report to savannah so we can formally
track it.

Thanks,

--ewh

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Consecutive spaces not displayed in some cases

2005-03-30 Thread Earl Hood
On March 29, 2005 at 18:56, John Van Essen wrote:

 But if mhonarc reflows long lines in normal plain text and uses pre
 (when it could have allowed it to flow), it seems to be consistent to do
 the same with (supposedly) flowed text.

The message cited did not have very long lines, at least the lines
showing the problem.

Mhonarc does not reflow plain text messages.  The exception is when
the message is tagged with format=flowed.  In this case, RFC 2646
semantics are followed during conversion (unless the disableflow option
is set).

The reflowing of non-flowed plain text messages is actually the
maxwidth enforcement.  By default, there is no maxwidth limit, but
mail-archive.com has enabled it for stylistic reasons.  In this case,
long lines are broken into multiple lines.  No paragraph reflowing is
actually done.

--ewh

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Consecutive spaces not displayed in some cases

2005-03-29 Thread Earl Hood
On March 27, 2005 at 12:43, Jeff Breidenbach wrote:

  d) use the m2h_text_plain::filter disableflowed setting
 
 Ok. I've disabled reflow on rsync@lists.samba.org as a guinea pig.
 John, please monitor new messages to the list for a few weeks and let
 me know how it works out. If there are no bad side effects, we can
 consider a site-wide configuration change.

If someone can provide me a raw email message illustrating the
problem, I can try an look at it.

The format=flowed setting in email messages have well-defined
semantics, and mhonarc (more specifically, the m2h_text_plain::filter
filter) tries to follow the RFC as best as it can.   Therefore, either
the messages are mis-tagged as flowed or there is a bug in the
filtering code.

Please check-out RFC 2646.

--ewh

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] more localization (l10n)

2005-02-16 Thread Earl Hood
On February 15, 2005 at 02:00, Jeff Breidenbach wrote:

 I just reviewed all the language translations and am looking for
 volunteers to help make some changes. This is technical work
 involving character sets and doesn't necessarily require language
 fluency.

Looks like you may be able to write a Perl script that uses
the Encode module to do the conversion for you.  For example,
you can use Encode to convert koi-8 to utf-8, and than convert
the UTF-8 values to numerical entity references.

Have a look at MHonArc::CharEnt's _utf8_to_sgml() routine.
It basically provides the conversion code to do what you want.
You only need the part for Perl = 5.6 if using later versions
of Perl.

--ewh

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] status report + look and feel questions

2004-11-24 Thread Earl Hood
On November 24, 2004 at 10:29, Fred H Olson wrote:

 On my lists I still find that requiring posts to come from subscribed
 addresses keeps virtually all spam from being distributed. I've had
 very few if any instances of spammers subscribing to a list to spam it.
 Does mail-archive.com archive lists to which anyone can post?

List administration is handled by the list owners not mail-archive.com.
Therefore, if the list owner allows anyone to post to the list, then
the messages will get archived (unless mail-archive.com spam filters
believe such messages are spam).

 As one last precaution I have new subscribers first messages moderated
 (sent to the reject page) so I'd catch a subscribed spammer's first
 message.  This has the added advantage of catching some please
 unsubscribe me messages from people who never post anything else.

Something that may be good to do for list administrators.  Mail-archive.com
does not perform any list administration functions.

 -- Advertising on mail-archive.com --
 Regretable that you have to have it but it's more tolerable than yahoo's.
 With my browser (Mozilla 1.4.1) the ads occasionally prevet the last few
 characters of a message line from being displayed. Example, in:
 http://www.mail-archive.com/mpls%40mnforum.org/msg32125.html
 The end of the third line on my display reads

What operating system?  Message looks fine to me, but I'm using a
later version of Mozilla.

 The list name link in the upper left corner of a message page and of index
 pages bring up an index page.  Such a link on index pages is pretty
 useless, it would be much better to link to the lists info page (I think
 all lists should and most do have these) which in turn has description of
 list, subscription info etc. Are there links somewhere to contact info
 for archived lists?

Mail-archive.com is as automated as possible, including the detection
of new lists to archive.  Helps keep operational costs down.  Right
now, there are no facilities for list administrators to register
list info, and such capabilities would require human-based review
for content.

I believe the folks at mail-archive.com have considered additional
features similiar to this, but such things will probably not get added
unless it can be automated and done in a secure fashion.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Beta of new design for The Mail Archive

2004-07-25 Thread Earl Hood
On July 25, 2004 at 13:29, Jeff Breidenbach wrote:

 I would add index and {next,previous}-by-date links to the bottom of
 each message page.
 
 Do people have a specific preference where this would go and what it might
 look like?

Somewhere at the bottom :)  Replication of the nav links is good since
a person does not have to scroll back to the top when they reach the
end of the page.

You can replicate the framing design at the top (excluding the mail
archive logo and list name) on the bottom and replicate the nav links
withing the grey area.

Another semi-related suggestion is to define link tags defining
relationships that some browser recognize and provide built-in navigational
buttons for: http://www.w3.org/TR/html401/types.html#type-links.
Useful ones would be next, prev, contents, index.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] Spam incident today over Gossip mailing list!

2003-12-27 Thread Earl Hood
On December 26, 2003 at 15:13, Jeff Breidenbach wrote:

 Gossip is a only members can post mailing list, and to join you have
 to go through a confirmation process. Until today's incident, I would
 have said we were pretty much spam proof because of that. (For those
 looking at the gossip archives, there is some spam mixed in, but none
 of that went over the actual list -- it's all an artifact of
 Mail-Archive getting spam with extremely funky headers causing them to
 archive with gossip). Anyway, I am shocked! At this point I have no
 idea if this is some fort of automated attack on Mailman mailing
 lists, or if we have a spammer manually targetting gossip. I've of
 course removed the spammer from gossip. If there are more incidents, I
 guess I'll have to figure out additional measures. This is definitely
 a new low and I'm sorry that this garbage got through.

Spammers are getting more agressive.  For example, I have had
two incidents of spammers using bug submission for the MHonArc
project to spread their message.  It is also common for some
to cruise web forums and post spam messages.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://www.mail-archive.com/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] Re: oodles of spam lists at Mail-Archive.com?

2003-11-21 Thread Earl Hood
On November 21, 2003 at 10:20, Dan Kegel wrote:

  Some would argue that spam exists precisely because running a mail
  server is so economical.  Perhaps it should be more expensive.
  
  Small ISPs and organizations can relay mail via their DSL provider's
  servers, just like individuals do.  Larger organizations can pay for a
  real Internet connection.  I see no problem.
 
 I'm with Pat on this.   As someone who's had occasion to
 worry about security since 1992 or so, I fully support
 the idea that ISPs should by default block outgoing SMTP
 from customers by default (and encourage customers to
 use the ISP's SMTP relay).

Then you start getting into some potentially political and legal
problems.  I.e.  What is the nature of the service?  Typically, your
service provides a Net connection allow TCP/IP traffic with a notice
that you would not abuse the service.

Now, you advocate that blocking specific protocols are okay, but that
is not what many people sign up for.  Such logic backs ISPs that start
blocking other traffic (like IPSec) to force customers to purchase
more expensive service agreements (which I believe some ISPs have
done).

With that said, blocking SMTP may be good policy, BUT ISPs must
clearly indicate this behavior to customers and make sure it is
mentioned in the service agreement.

Also, such policy will probably only be enforced on home users.
Those that choose to pay for better services will be exempt of such
rules; ISPs want more money and they are happy if there servers
receive less of a load.

Remember, ISPs are in business to make money, and they play both
sides of the field on the spam debate.  For example, spammers use
a lot of bandwidth, and the ISPs get money for such usage.

BTW, does anyone have stats on the number of spam messages that
come from dynamic address ranges?  Especially U.S.?  It seems
to me that much spam is relayed through foreign countries.

Also, how do you know a range is dynamic?  Whois database does not
formalize such information, and such policies can change at any
time for whomever owns a specific range.

 The situation now is terrible, and somewhat analogous
 to how operating systems used to ship with all services
 on by default.   It was a big improvement when OS's
 started shipping with services off by default, and
 doing the same thing with outbound SMTP at ISPs would
 bring a similar improvement.

As I noted in a previous message, it will not stop spam.  Spammers
that use worms to infest other systems, will just adjust tactics
by using the outgoing SMTP server settings to send out spam.

Someone suggested that ISPs may filter outgoing mail, but personally,
I find this worrisome on privacy grounds, and technically, it doubles
the load of ISPs.  Plus, for it to work, ISPs will eventually have to
notify their customers when they detect questionable out-bound mail,
which will raise a political firestorm about privacy and PR problems
for ISPs.

If you really want to defeat spam, educate the idiots that actually
respond to spam messages to stop responding.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://www.mail-archive.com/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] spaminator

2003-10-17 Thread Earl Hood
On October 16, 2003 at 22:17, Jeff Breidenbach wrote:

  5) I've been reading up on the latest anti-spam weaponry.
 Crazy stuff. In particular, I see that one people do
 is use to generate poison email addresses on the fly - which 
 encode the IP of the harvesting spambot. Clever.
 
 These emails are valid in that they lead to a teergrub - an MTA
 that recognizes these addresses and tries to slow down the
 spammers MTA as much as possible. But it also seems that this
 would be a decent way to create a spambot black hole list based on
 IP. And because HTTP isn't usually relayed like SMTP is, this
 might actually work. Is anyone already doing this? Any experts
 want to comment?

The problem is the assumption that the IP address is legally controled
by the spammer.  There have been incidents where spammers are infecting
regular people's computer systems (generally through some flaw with
Windows) inorder to send out spam.

I think some verification of the IP is needed to see if the owner
of the address has been a victim or is an open relay that the owner
refuses to close.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://www.mail-archive.com/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] Survey: Google AdSense on Mail-Archive?

2003-07-15 Thread Earl Hood
On July 14, 2003 at 22:48, Jeff Breidenbach wrote:

 Earl, I'm also worried about me changing my mind - where to
 put the AdSense, whether to use it at all. So I'm
 tempted to make the changes as unpermanent as possible
 at first. Hopefully there is an Apache module that can do a 
 simple search and replace on the fly for HTML.

Then doing SSI is probably the best approach.  You can update your
MHonArc resource file to include an SSI directive.  When you want
ads to be displayed, you just enable SSI for HTML files located
in the archives.  If you do not want the ads, you can disable SSI
processing in the server.  Therefore, the SSI comment would have
no effect.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://www.mail-archive.com/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] next anti-spambot steps

2003-02-21 Thread Earl Hood
On February 21, 2003 at 10:43, Kir Kolyshkin wrote:

 BTW ASPseek does (and internally stores all data in UTF-8, so it's not a 
 problem to have many different encodings in one DB, including even CJK 
 ideographs).
 
 I suspect* if you will correctly specify charset (by having, say
 
 HEADER
 ...
   META NAME=Content-Type CONTENTS=text/html; charset=windows-1251
 
 in HTML document, ht://Dig will understand it and handle the document 
 correctly.
 
 * DISCLAIMER: I'm not a ht://Dig expert, but rather ASPseek guru :)

According to the htdig site, it states it supports iso-8859-1 and
the standard HTML entity references.  I did not see anything else
about other encodings except that UTF-8 was on the TODO list.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip


Clueless (was Re: [Gossip] Problem with OPERATION Keyword in ASN.1)

2003-02-14 Thread Earl Hood
On February 14, 2003 at 18:27, Stephen Turner wrote:

 My best guess is that they google for some information, find a semi-relevant
 mail on mail-archive, and then try and find out who runs this helpful mail
 archive service that can answer their questions.

I've gotten the same type of messages from time-to-time also, so I am
sure the problem is not unique to mail-archive.  I've gotten in the
habit of ignoring them.

  It's weird; gossip has 135 subscribers and I'd rather not
  see us get this type of spam but I have no idea how
  to prevent it. Maybe I should expand the footer of
  gossip messages to remind everyone of the topic?

Maybe you should add a very prominate notice at
http://www.mail-archive.com/faq.html#support.  Maybe something like:

  NOTE: The [EMAIL PROTECTED] is strictly for discussions related
to the mail-archive.com service, AND ONLY the mail-archive.com
service.  Messages related to the content archived on
mail-archive are NOT appropriate for the list and will
be ignored.

Examples of appropriate topics for [EMAIL PROTECTED]:

  * My list does not show up in the archives.
  * List messages are not getting archived.
  * Searching is not working.
  * ...


 I wondered if you should call the list
 [EMAIL PROTECTED], but I doubt
 even that would help: I suspect that the people who are posting these
 questions haven't worked out that the information they've googled for wasn't
 provided by mail-archive. 

I suspect the people who post these questions are completely clueless
and not qualified to solve the problem they are asking about.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Some Messages not archived to wedi-transactions

2003-01-08 Thread Earl Hood
On January 7, 2003 at 15:42, William J. Kammerer wrote:

 The listserver is managed by an unrelated company.  They would refuse to
 look into the problem, as I have already asked them why messages arrive
 so late at my mail server (i.e., two day delays sometimes) - and they
 say it is my problem (at my ISP)!  At least in the case of describing my
 problems, I can show them my headers and tell them when my mail server
 received the messages.

Some service.  You may want to considering hosting the list yourself
or find a better provider.

I think they are wrong about it being your ISP.  The sample header
your provided shows received header dates of 4 Jan 2003, but the
Date: field having a 2 Jan date.  Since the message is what the
listserv creates, the received headers of when the message was
sent by the author to the listserv are not present.

From what I see of the header, I fail to see how the list service
provider can conclude it is your ISP's problem.  Have you polled
other list subscribers about times they received  the message?

 But I am fairly certain that I have received all messages posted to the
 listserve in question between 12-30 and today.  The messages I described
 awhile back on Gossip were received by me, but were not archived by Mail
 Archive. This is a common occurrence (on various of the WEDI listserve
 archives).  I discover it quite frequently when attempting to refer
 someone to a posting via URL - only to find the posting is missing from
 the archive.
 
 I suppose it's possible that I receive all messages (even if some are
 delayed), while Mail Archive does not.

Well w/o decent cooperation from your list hosting provider, it
may never be known.  However, we can make reasonable speculations on
what it may be.

First, it appears the missing messages occured when there was large
delays in the listserv sending out messages.  If you have no other
cases when messages are lost, then it provides more weight that it
is a listserve problem.

Second, no other reports of select missing messages for other
mail-archive.com archives have been reported for the time period in
question.  If mail-archive was somehow involved, then it would seems
other archives would have the same problem.  If no one else reports
a similiar problem to what you have, it again puts more weight that
it is a listserve problem.

If it really important to have the messages archived, have a
look at http://www.mail-archive.com/faq.html#import.  You could
resend the messages you got back to mail-archive.com as described
in the FAQ in order to get them to show up in the archive.

--ewh

P.S. BTW, the Lyris listserv practice of changing the message-id
violates RFC 2822.

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Some Messages not archived to wedi-transactions

2003-01-07 Thread Earl Hood
On January 7, 2003 at 10:46, William J. Kammerer wrote:

 If you searched on this Message-ID in your logs, I would not have
 expected you to find a match because the Message-ID is specific to the
 recipient.  The header I showed are specific to me as recipient, and the
 headers in the messages sent to archive@mail-archive.com would look
 somewhat different as the Lyris listserver assigns unique Message-IDs
 for every recipient.

IMHO, a questionable practice since it screws up references (for
discussion threads) and makes tracking delivery errors harder (like
in this case).

Looking at the header you did provide:

LYRIS-14922627-168882-2003.01.02-15.00.40--wkammerer#[EMAIL PROTECTED]

it could be implied that the ID for other users would be:

LYRIS-14922627-168882-2003.01.02-15.00.40--[useraddress]@lists.wedi.org

where [useraddress] is the subscriber's address with @ replaced with
a #.  Therefore, I would guess the Message-ID for archive@mail-archive.com
would be:

LYRIS-14922627-168882-2003.01.02-15.00.40--archive#[EMAIL PROTECTED]

Of course, this is just guessing.  Since a timestamp is part of the
ID, it could also vary if it is based on when the message sent to the
receipient.  Therefore, one may have to grep for something
like (using regex notation):

LYRIS-14922627-168882-2003\.01\.02.*--archive#mail-archive\.com@lists\.wedi\.org

Assuming the day part is the same and only the time part could vary.
To be more general:

LYRIS-14922627-168882-.*--archive#mail-archive\.com@lists\.wedi\.org

If the LYRIS-14922627-168882 is sufficiently random, the
above should be sufficient in searching for the message.  The
archive#mail-archive\.com could probably be dropped from the
expression to be extra forgiving and still avoid potential false
positives.


BTW, some list management software may not do auto-retries on a failed
mail delivery.  For example, for one list, if unable to deliver a
message, the list software sends a status message to the receipient of
the problem.  If repeated tries of sending the status message fail,
the address is auto-unsubscribed.  If successful, the status message
contains instructions for the receipient on how to retrieve the past
undeliverable messages.  Since mail-archive.com is all automated,
such manual retrieval methods would not be supported.

It seems that it would be more effective for you to check out your
listserver's delivery logs to track down the problem.  If such logs
do not exist, you may want to enable such a feature, if available,
to avoiding burdening mail-archive, and others, when troubleshooting
errors when it is not even clear if mail delivery to mail-archive
actually occurred for the messages in question.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Some Messages not archived to wedi-transactions

2003-01-05 Thread Earl Hood
On January 5, 2003 at 14:50, William J. Kammerer wrote:

 Earl: Thanks for the suggestion.  But as I wrote, most - but not all -
 messages do get posted to the archive.  Since X-No-Archive: yes would
 most likely be something provided by the Listserve administrator for
 every message, we should assume that this MIME header is not used at
 all, and is not the cause of the messages not being archived.

The X-No-Archive: can be added by the Sender directly.

Side Note: I think if Listserv is removing such headers before
resending to the list, this is a serious bug with Listserv.

 In order to further illuminate the problem, I have included one of the
 message's headers below.  The message was posted my me on Thu, 2 Jan
 2003 15:00:56 -0500.  I have every reason to believe it was received by
 the listserver immediately, and successfully distributed to many
 subscribers within a short period of time - I know this because I often
 get responses from correspondents which include the text of my message.
 But in this case, you can see that I did not receive my copy of the
 message from the listserver till 4 Jan 2003 03:16:24 - - i.e., 01-03
 at 10:16 PM ET, nearly 32 hours later!!  And yes, the clock is correct
 in the Date: message; as you can see, it's a message sent by me!

I believe the date of the messages themselves do not matter.

It may be due to delivery queuing by MTAs, either on your end or
mail-archive.com end.  Since some MTAs may be configured to queue for
a few days, it may be possible the messages will show up in the next
day or two.

How does Listserv deal with failed deliveries?

 Since the message has not appeared in the Mail Archives at
 http://www.mail-archive.com/wedi-transactions@lists.wedi.org/, I would
 assume either 1) Mail Archive never received a copy itself (not
 implausible considering it took 32 hours to get to me!), or 2) Mail
 Archive did not archive such a late-arriving message because it was
 confused.

1) is possible and 2) is highly improbable.  I'm not aware of any
date ordering limitations in the mail-archive service.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] data transfer status

2002-09-17 Thread Earl Hood

On September 17, 2002 at 13:13, Kir Kolyshkin wrote:

 Jeff Breidenbach wrote:
 
  There are still some network details and configuration still left to
  do on the new machine, including:
  
   Mail Transfer Agent [exim]
 
 Can you please emphasize on why have you choosen exim?

If you are just curious why something like sendmail is not used,
I think any choice of software is a matter of what best fits the
needs you have.  My guess on why Jeff goes with exim is that it is
lighter weight than sendmail and is purported to be easier to configure
than sendmail.  Some also say that exim has better anti-spam support.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] No Messages Archived After Registrar Change

2002-08-29 Thread Earl Hood

On August 29, 2002 at 13:19, [EMAIL PROTECTED] wrote:

 My list is setup to send messages to mail-archive.  It was working, then
 when I went through the hiccup when I changed registrars for my domain
 name.  Everything came back online within one day, except no new messages
 appear on the archive.
 
 Any ideas?  I removed and re-entered the mail-archive email address in my
 list with no luck.

Jeff has informed this list that mail-archive.com is going through
some growing pains, and there will be some hiccups in the service
until the new hardware is in place.

--ewh

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Anti-spam measures?

2002-03-21 Thread Earl Hood

On March 20, 2002 at 19:47, Jeff Breidenbach wrote:

 Now the bad news is that I set up a honeypot address to catch spam as
 soon as I added this obfustication, and that's already caught one
 piece of spam, indicating a harvest. This is depressing, and I suspect
 Mail-Archive will ultimately lose this arms race to the bad guys.

A general practice that people can follow to help minimize
harvesting is to not include your regular email address in the body
of messages that you compose.  It is typically unnecessary since
practically everyone will key off what is in the header (which is
done automatically by the `repl' capability of users' MUAs).

Also, for those that reply to messages and quote the message that
is being replied to, DO NOT include email address headers.  I notice
that apps like Outlook automatically do this (an annoying behavior).

--ewh

P.S.  There are legal movements to outlaw spam, so hopefully,
the ultimate solution will be a legal one and not a technical one.
Technical ones are hard, if not possible, to achieve since any smart
harvester programmer can add heuristics to deal with common obfsucation
techniques used by people and/or tailor harvesting programs to
specific sites that are known to contain many obfsucated addresses.
(Jeff, your `Reply' to button can be exploited).

P.S.S. Sometimes your address may be given out by someone you do
not know or by some mechanism outside of your control.  For example,
there was a dialup ISP that I used for a few years.  I never gave out
my mail address assigned to me since I had other addresses I used.
However, soon after I signed up, I started getting ton of spam.
The ISP said they do not give out any addresses, however I suspect that
either someone has access to their subscriber list and selling it or
someone is able to get at the information without the ISP knowing it.

___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



[Gossip] MHonArc DOWNGRADE ALERT! (was Re: corruption problem )

2001-11-14 Thread Earl Hood

On November 12, 2001 at 21:27, Jeff Breidenbach wrote:

 I seem to be experiencing a systematic corruption problem.  For
 example, in the last half-dozen more recent entries of this date index
 [1], the message pages are non-existant and listed under the same URL.
 
 The version of mhonarc running on this archive has been
2.early - 2.49 - 2.5.0 -  2.4.9 - 2.5.0
 
 MHonArc 2.5.0 is not generating any warnings and is returning a good
 return code. What steps are suggested for diagnosis, and are there any
 suggestions for a fix? Rebuilding the archive from scratch is possible
 but not desirable, due to the large number of archives
 affected. Currently my top priority is stabilizing the system.

I just remembered that when upgrading to v2.5, there are some
data format changes to some items in the .mhonarc.db file.  Pre-v2.5
archives are automatically upgraded.  From the release notes:

  * If updgrading from v2.4.x, or earlier, reference and follow-up
information of a message is now stored in a different format in the
database (and internally). MHonArc will auto-update older archives to
the new format, so no action should be required on your part.

However, if you downgrade, you will have problems.  I.e. Some data
in the .mhonarc.db is incompatible with previous versions.  I do
not know off-hand what would be the expected problems without more
analysis and if this is the cause to the problems you noted above.
Reference and follow-up information will definitely get screwed up.

The only way to safely downgrade is to pre-process the .mhonarc.db
file to convert from v2.5 format to v2.4 format to avoid data loss.

Jeff, I think the only way to properly fix your problem is to
regenerate archives. 

As for the problems for the Cygnus folks, I would avoid downgrading
if possible.  If you feel it is needed, I can help out with what steps
are needed to pre-process your .mhonarc.db files.  Or, you could
run v2.4.9's version of mha-dbrecover for each archive to reconstruct
.mhonarc.db files in v2.4.9 format (Jeff, you cannot do this since
you have already processed message page files and the reference information
contained within them will have been lost for any pages edited).

For those interested in a pre-processing approach to .mhonarc.db, 
what would need to be done can be determined by the update routine
in mhopt.pl that updates pre-v2.5 archives.  Here is the routine:

sub update_data_2_4_to_later {
my($index, $value);
while (($index, $value) = each(%Refs)) {
next  if ref($value);
$Refs{$index} = [ split(/$X/o, $value) ];
}
while (($index, $value) = each(%FollowOld)) {
next  if ref($value);
$FollowOld{$index} = [ split(/$bs/o, $value) ];
}
while (($index, $value) = each(%Derived)) {
next  if ref($value);
$Derived{$index} = [ split(/$X/o, $value) ];
}
}

You will notice that list data has been converted from the old Perl 4
hack of storing lists within another data structure to using Perl 5
anonymous arrays.  The reverse of this process will have to be done
if you want to patch the .mhonarc.db files directly vs trying the
v2.4.9 mha-dbrecover script.

I apologize for the problems that exist.
When it rains, it pours.

--ewh



___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] Re: Mhonarc problems at mail-archive.com

2001-11-11 Thread Earl Hood

On November 10, 2001 at 00:04, Jeff Breidenbach wrote:

 Version v2.5 avoids this problem since HEADER and FOOTER resources
 are no longer supported.
 
 I downgraded backed to mhonarc 2.4.9 to see if it would help with
 performance problems.

Was there a difference?

 In fact, the time sequence went like this:
 
1) 2.4.9 + 2.5.0 config
2) 2.4.9 + 2.5.0 config
3) 2.5.0 + 2.5.0 config
4) 2.4.9 + 2.5.0 config
 
 So I'm not shocked if some there are a few hiccups...

There will be some definite hiccups since the 2.5.0 generated date
index files will not contain the special comment declarations that
2.4.9 would need to update the date index.  I did not think about
issuing warnings when trying to downgrade to previous versions.

You can regen the index file by doing something like the following:

mhonarc -editidx -nomsgpgs ...

The -nomsgpgs will cause mhonarc to skip editing message file
pages.  Therefore, only index pages will be regenerated.

IMPORTANT: If your main index page (in this case the date index)
is screwed up, you will have to delete it first if using v2.4.9, or
earlier of MHonArc.

--ewh



___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: [Gossip] QA

2001-08-31 Thread Earl Hood

On August 30, 2001 at 23:05, Jeff Breidenbach wrote:

 Q: Thread linking (-- Thread --) doesn't work half the time.
 A: Give a specific URL and I will investigate.

I should follow-up and state that if you are at the end of
thread boundaries, the -- and -- will jump you to adjacent
discussion threads (assuming Jeff is using MHonArc's normal
$TNEXT$ and $TPREV$ variables).

Also, since a window of certain size is being used to only provide
an index for the latest set of messages, if there exists a really
long discussion thread, breaks can occur in the -- and --
links for messages in the thread that are outside the window since
messages are no longer updated (but still searchable) once they
are outside of the window.  A limitation of the poor man's windowing
technique, but it does speed up mail-archive mail processing
considerably.

One potential change that could help in navigation is to use the
TSLICE features of MHonArc to provide a mini-thread listing
on each message page.  I can contribute some layout settings
if Jeff is interested.  All I should need is a copy of the
the resource files mail-archive uses.

--ewh



___
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip



Re: Archiving by month

1999-09-06 Thread Earl Hood

On September 5, 1999 at 13:49, Jeff Breidenbach wrote:

 Paul, see how attachments end up in subdirectories, for example
 http:[EMAIL PROTECTED]/msg00459.html
 The default rcfile puts attachments in a subdirectory, with a .dir
 extension. You are probably overriding the MIMEArgs directive, or
 perhaps .html attachments are treated differently.

I think the problem is a change in mhexternal.pl of MHonArc in
the naming of attachment subdirectories.  It appears I did not
mention it in CHANGES, but here is the SCCS delta comment on it:

D 2.7 99/06/25 13:59:18+05:00 [EMAIL PROTECTED] 22 21  3/3/228
P /home/ehood/work/perl/MHonArc/lib/mhexternal.pl
C Removed addition of .dir to subdir.

According to the date, it was applicable for v2.4.0 or v2.4.1.  I
cannot remember the exact reason for the change, but some user had
problems with the .dir so I figured no harm (ha ha) would occur
if I removed the .dir.

I do not know how htdig works, but can it index specified list of
file types (eg: .html, .txt), or can you specify a regex/glob mask
(or match) to control indexing?

--ewh