[loewis@informatik.hu-berlin.de: Bug#131512: Need UTF-8 archives]

2002-02-01 Thread Jeff Breidenbach


I received a UTF-8 feature request/patch [1] for MHonARC from a a Debian
GNU/Linux user. Any comments? Is this something that MHonArc might
consider incorporating directly?

Cheers,
Jeff

1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=131512&repeatmerged=yes




big5 l10n

2001-12-09 Thread Jeff Breidenbach


Any comments on MHonARC compatibility with the big5 traditional
chinese charater set? I recently received a big5 localization
(http://mail-archive.com/rcfile.tw) but am not having particularly
good luck with it so far, e.g.

http://www.mail-archive.com/control%40anc.dyndns.org/msg00129.html

Any suggestions or known incompatibilities? Do I need to somehow
escape the big5 data in the rcfile? I get lots of warnings suggesting
that something is not going well with the parsing, like:

 Warning: Unrecognized variable: "H: "
 Warning: Unrecognized variable: " <"

Any comments appreciated, especially from those successfully
using MHonARC with big5 or a similar character set.

Cheers,
Jeff




request, Debian sid now running mhonarc 2.5.1

2001-11-18 Thread Jeff Breidenbach


Hi all,

FYI, two Debian/MHonArc notes:

1) I'm got an interesting feature request regarding  and
   default configurations and want to pass it on; see bug #115991
   (http://bugs.debian.org) for commentary.

2) MHonArc 2.5.1 is now part of Debian/Sid (unstable) and should
   make it's way into Debian/Woody (testing) in ~14 days.

-Jeff


--- Start of forwarded message ---
Envelope-to: [EMAIL PROTECTED]
From: Debian Installer <[EMAIL PROTECTED]>
To: Jeff Breidenbach <[EMAIL PROTECTED]>
X-Katie: $Revision: 1.59 $
Subject: mhonarc_2.5.1-1_i386.changes INSTALLED
Sender: James Troup <[EMAIL PROTECTED]>
Date: Sun, 18 Nov 2001 15:05:17 -0500


Installing:
mhonarc_2.5.1-1.dsc
  to pool/main/m/mhonarc/mhonarc_2.5.1-1.dsc
mhonarc_2.5.1-1_all.deb
  to pool/main/m/mhonarc/mhonarc_2.5.1-1_all.deb
mhonarc_2.5.1-1.diff.gz
  to pool/main/m/mhonarc/mhonarc_2.5.1-1.diff.gz
mhonarc_2.5.1.orig.tar.gz
  to pool/main/m/mhonarc/mhonarc_2.5.1.orig.tar.gz
Announcing to [EMAIL PROTECTED]
Closing bugs: 


Thank you for your contribution to Debian.
--- End of forwarded message ---




upgrade/downgrade adventure

2001-11-17 Thread Jeff Breidenbach


Ok, I think mha-dbrecover is the final missing piece, and I am now
getting back in business. It seems to completely fix smaller archives,
and is currently chugging along automaticly (woohoo!) on a
50k-messages archive.

-Jeff




patch applied to Debian/MHonARC

2001-11-14 Thread Jeff Breidenbach


The OTHERINDEXES fix has been uploaded to Debian autobuilders and
should hit Debian unstable tomorrow, at least on i386. 

Cheers,
Jeff




Re: mhonarc 2.5.0 infinite loop on index creation?

2001-11-13 Thread Jeff Breidenbach


>Jeff, do you use OTHERINDEXES?  If so, this could be a culprit in
>your performance problems.

Yes, I use OTHERINDEXES for a very short rdf/rss index.
Please note that I've not had any symptoms comparable to
what Cygnus is reporting -- although at this point I'm in
no position to confirm anything for sure.

-Jeff




corruption problem

2001-11-12 Thread Jeff Breidenbach


Mhonarc gurus,

I seem to be experiencing a systematic corruption problem.  For
example, in the last half-dozen more recent entries of this date index
[1], the message pages are non-existant and listed under the same URL.

The version of mhonarc running on this archive has been
   2.early -> 2.49 -> 2.5.0 ->  2.4.9 -> 2.5.0

MHonArc 2.5.0 is not generating any warnings and is returning a good
return code. What steps are suggested for diagnosis, and are there any
suggestions for a fix? Rebuilding the archive from scratch is possible
but not desirable, due to the large number of archives
affected. Currently my top priority is stabilizing the system.

Thanks in advance for any suggestions.

Cheers,
Jeff


[1] http://www.mail-archive.com/gossip%40jab.org/maillist.html




Re: [Gossip] Re: Mhonarc problems at mail-archive.com

2001-11-11 Thread Jeff Breidenbach


>> I downgraded backed to mhonarc 2.4.9 to see if it would help with
>> performance problems.
>
>>Was there a difference?

I think so, although there were a lot of other things making the
determination unclear. For starters: A near full filesystem, a runaway
process consuming one of the CPU's, corrupted index pages, etc.

One thing I notice is 2.4.9 seems to have a bounded time (~10 seconds)
for putting one new message in a 1000 message archive. 2.5.0 seems to
be less bounded. I also notice very long thread slices on some message
pages. For example, see the bottom of:

   http://www.mail-archive.com/mhonarc%40ncsa.uiuc.edu/msg02482.html

This makes me suspect I am taking a performance hit from 2.5.0, at
least the way I have it configured. I've just gotten to the point
where I've repaired the index pages and am ready to start running
2.5.0 again; I'll keep you posted.

-Jeff

The rcfile is accessable off http://mail-archive.com/faq.html
(src/rcfile.int)




Re: [Gossip] Re: Mhonarc problems at mail-archive.com

2001-11-10 Thread Jeff Breidenbach


>Version v2.5 avoids this problem since HEADER and FOOTER resources
>are no longer supported.

I downgraded backed to mhonarc 2.4.9 to see if it would help with
performance problems.

In fact, the time sequence went like this:

   1) 2.4.9 + 2.5.0 config
   2) 2.4.9 + 2.5.0 config
   3) 2.5.0 + 2.5.0 config
   4) 2.4.9 + 2.5.0 config

So I'm not shocked if some there are a few hiccups...

-Jeff




new maintainer, Debian package of MHonArc

2001-10-17 Thread Jeff Breidenbach


This is a quick heads up that I'm now the maintainer for Debian's
MHonArc package.

Cheers,
Jeff




ANNOUNCE: MHonArc v2.4.8

2001-04-18 Thread Jeff Breidenbach


This point release is extremely helpful and addresses 
several real world problems. Earl, you kick butt. 

Jeff




image Content-Type statistics

2001-01-07 Thread Jeff Breidenbach


Here's some rough statistics as promised -- essentially I am seeing
that MIME type correctness for images varies a lot across different
mailing lists. For example:

One list archive with about 6000 image attachments has:

51   application/octet-stream
1134 image/pjpeg
4866 image/jpeg
18   image/jpg

Another list archive with about 100 image attachments has:

75 application/octet-stream
0  image/pjpeg
15 image/jpeg
0  image/jpg

As you can see, some lists are better than others in terms
of proper Content-Type labeling. Interestingly, I didn't see
much weirdness beyone application/octet-stream -- maybe I just
didn't look hard enough.

Jeff




Re: mhexternal.pl switch, message deletion switch

2001-01-05 Thread Jeff Breidenbach


>You could easily do the following to exclude images:

Ok, I just set the image/jpeg filter to mh2_null. I'll report in a few
days what percent of .jpeg and .jpg files this actually kills off.

>BTW, what do you do about the index pages?  I.e.  The message will
>still be "removed" wrt index page generation, so the only way to get
>to the files would be from search results.

That's exactly the situation, and so far, it is workig very well.

Cheers,
Jeff




mhexternal.pl switch, message deletion switch

2001-01-04 Thread Jeff Breidenbach


Earl,

1) Are you still thinking about modifying the mhexternal.pl to support
   exclusion of files based on filename regexp and/or content type? I'm
   about ready to declare a vendetta on image/jpeg.

2) Any chance of adding a resource to disable message deletion? (i.e.
   I can have a constant size .mhonarc.db through MAXSIZE but
   unlimited message files). I'm currently using patched code to
   achieve this.

Jeff




Re: Idea for the future

2000-11-29 Thread Jeff Breidenbach


> I have thought of this along time ago.  It is a change on some of
> the functional goals of MHonArc: no dynamic system is required to view
> archives.  I.e.  One can read MHonArc archives w/o the need of a server.

Let me count the ways I like static HTML files. 

  * simplicity
  * simplicity
  * their benefit from internet caching infrastructure
  * computational cheapness of serving them (disk is cheaper than CPU)
  * manipulability with many, many tools
  * benefit from OS level improvements

Some people, like me, despise databases and like static files a lot.
Some people are the other way around. Anyway, us static-file lovers are
a valid user set.

Jeff




Re: Invisible threads in lyx-devel@lists.lyx.org

2000-07-24 Thread Jeff Breidenbach


Hi Rae,

I think  you are seeing an issue with MHonArc, where URL of a thread
index page with a particular message will change out from under you if
it is set to show newest threads it the top of the page.

In essence, I think I can't help you except to say "link to messages,
not the index page." I don't think there's an easy solution from
the MHonArc side, but am CCing them just in case.

Cheers,
Jeff

--

Envelope-to: [EMAIL PROTECTED]
Sender: [EMAIL PROTECTED]
Date: Mon, 24 Jul 2000 20:29:41 +1000
From: Allan Rae <[EMAIL PROTECTED]>
X-Accept-Language: en
To: [EMAIL PROTECTED]
Subject: Invisible threads in [EMAIL PROTECTED]
Content-Type: text/plain; charset=us-ascii

Hi,

I'm a part of the LyX Team and am writing the LyX Development News.
As such I like provide references to threads as well as individual
emails in the archive.  I've noticed that some threads just aren't
appearing in the threaded list at all.  However the individual emails
can be searched for and found.

For example msg12127.html is the start of a thread however if I try to
link to it as:
http://www.mail-archive.com/lyx-devel@lists.lyx.org/#12127

the page that is shown doesn't have this email or this thread visible
anywhere on it. If I follow the "[earlier emails]" link a couple of
times I eventually reach a point where the email should be visible
(since these threads are ordered by message number of opening message)
it's not there.

There are a couple of other threads doing the same thing.

Regards,
Allan. (ARRae)




Re: why no META tag for charset?

2000-04-21 Thread Jeff Breidenbach


>A potential solution would be to put the different message parts into
>different files in the archive, and use the remainder of the message as
>a container for URLs to those files, mimicking the MIME message
>structure in HTML.  I haven't looked to see how/if one can do that in
>MHonarc, but this seems like a problem similar to archiving a multi-part
>with multiple graphic inclusions.

Sounds like that would work. I guess the competition is the
"do-nothing-and-let-break" approach, and the "convert-to-unicode" approach.
Each has distinct advantages.

>Rely on a graphical interface instead of text?

Grumble, grumble, grumble. I'd rather help make programs and protocols 
smart enough to deal with these sorts of (tractable) issues.

>BTW, while I applaud the desire to display localized headers I hope that
>any reply/follow-up interface is sending the canonical RFC 822 & later
>headers and keywords "on the wire", and not helping create messages like:
>
>Subject: Re: Sv: Re: Ab: Re:

Not an problem for me. Mail-Archive.com supplies a mailto: URL with an
embedded, unadulterated subject to the user's existing MUA.




Re: why no META tag for charset?

2000-04-21 Thread Jeff Breidenbach


>This is the problem, HTML does not support mixed character sets.
>Also, the charset affects the entire HTML document.  Therefore, your
>resource settings would have to conform with the charset, and this
>can be a big problem if messages existing in the archive have different
>specified charsets.  It would be hard to guarantee that all messages
>will use the same charset.

I think I understand ... is this right?

If an single email contains two different character sets,
you're screwed, I understand that.

If two emails are received, each with a different character set
1) you are screwed on index pages, which will has a bunch
   of subject lines from different character sets
2) you are screwed on message pages, because navigational aids
   like the word "follow-ups" will be in a different character set
   from the messages.

Ok, so I see how unicode would magically fix everything. But, imagine that
wasn't available, and I get a message in an unknown character set. 

The result is an un meta-tagged message page (which will default to either
iso-8859-1 or some browser heuristic). Assuming iso-8859-1, we get good
navigational aids and an undreadable message. Had we used a meta tag the
message would be readable and we'd lose the navigational aids. Yuck, yuck,
yuck, it's a choice between two evils. Given just those options, I think a
message page meta tag (generated from the corresponding email's character
set) would be better, though.

Converting to unicode won't be graceful either. If one converts everything
unknown to unicode, I bet in practice a lot of iso-8859-1 messages will go to
unicode and be unrenderable by legacy browswers. I guess legacy browswers
will have to be replaced.

Jeff




Re: MHonarc archives with internationalisation

2000-04-20 Thread Jeff Breidenbach


I wrote a large (English) resource file that overrode almost everything.
Then I wrote a short sed script to create a derived resouce file, replacing
all the English words with the localized language. As far as I can tell,
this is approach is the most reasonable way to support localization, if you
need to support multiple languages. My files wouldn't be good for
a "cookbook example" though, because I also do a lot of custom formatting
in the resource file.

>I am also interested in a localization, for the german language. Is there any
>"cookbook example" which describes only those rcfile settings which are
>necessyry for this job? There are examples for Mhonarc in non-english languages
>(e.g. dutch), but I would like to see just the "minimum requirements" for a
>localization.




why no META tag for charset?

2000-04-20 Thread Jeff Breidenbach


Recently, I've been on an internationalization/localization kick.
I just read the relevant portion of the HTML specification
and found it refreshingly clear.

Let's assume I want to process an email with some weird character set,
like ISO646-SE. It appears the right thing for MHonArc to do is
produce a HTML document that includes:

  

But as far as I can tell, MHonArc won't produce that meta tag. Thus
the character set information is lost, which can result in a difficult
to render web page.

I suspect there is a reason for this, but I'm not sure what it
is.  (I know there will be an issue if email contains multiple
character sets, since this is not supported in HTML documents.)
 
Jeff


-
HTML4 specification, character sets:
http://www.w3.org/TR/html4/charset.html

IANA Registry of character sets:
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

Character Set Converters Resource (MHonArc)
http://www.mhonarc.org/MHonArc/doc/resources/charsetconverters.html




MHonarc archives with internationalisation

2000-04-18 Thread Jeff Breidenbach

Regarding customization, you will have to create an rcfile to do the job. I
chose to override  and put the To/From/Subject customizations
in there.

Coincidentally, I'm also working on a French localization right now. My
translator suggested Subject --> Sujet, but you mentioned Subject --> Objet.
Which is better?

Cheers,
Jeff




Re: Namazu & MHonArc

2000-04-07 Thread Jeff Breidenbach

For the record, htdig can index through the local filesystem
(bypassing the HTTP protocol.) You are correct about htdig
not supporting multi-byte characters.




report from the trenches, perl compiler

2000-02-29 Thread Jeff Breidenbach


Hi all,

My archive of linux-kernel hit 112,000 messages.

It was way slow.
It was causing out of memory errors on a 256MB machine.

Rapidly advancing computer hardware wasn't enough. I finally broke
down and switched it to monthly indexing. Works like a charm, of
course.

Anyway, just wanted to report from the trenches that things are well
(or at least the line is holding, albeit with effort). It will be
interesting to see how long I can stay in the scalability game.

Jeff



is windowing still kicking around?

1999-11-11 Thread Jeff Breidenbach


A long time ago, there was a discussion of making MHonArc
work with windowing - for example, having new messages only
thread against the most recent thousand messages in an archive.
The idea was that things might go quicker for small additions to
large archives. 

Is this idea still kicking around? The reason I ask, is that I am
thinking about MHonArc performance again, and I see the possible
improvements as:

a) switch to a better filesystem like reiserfs
b) faster hardware, both storage and processor(s) 
c) split up big archives

If windowing is a possibility in the long term, I'll
let a and b and keep me busy.

Jeff



Re: how do I make sure that a message will be

1999-11-11 Thread Jeff Breidenbach

>A lot of mail clients don't even know how to parse the
>"&subject=" argument; to my knowledge there are none which attempt to
>add arbitrary headers.

Lynx 2.8.2rel.1 can parse mailto: URLs in accordance with RFC 2368
I'm using lynx right now, from the MHonArc produced mailto: at
http://mail-archive.com/mhonarc%40ncsa.uiuc.edu/msg01506.html
and I think it should pick up the embedded In-Reply-To: field fine.

Jeff 



Re: rcfile confusion?

1999-09-28 Thread Jeff Breidenbach


>BTW, you can use $MSG$ to get the filename for a given message if
>for some reason $A_HREF$ does not work for you.

Ah, perfect. The problem with $A_HREF$ was not the relative URL,
but rather the inclusion of the word "HREF=" in the output.

Jeff



Re: rcfile confusion?

1999-09-27 Thread Jeff Breidenbach


Hmmm... XML question #2.

Is it possible to produce something like the following:

  http://host/msg5.html

I see $A_ATTR$ and $A_HREF$ available for the LITEMPLATE resource, but
nothing that appears to give the unadulterated message URL. My guess
is that it is not an available resource variable, right?  As far as I
can tell, this is the only thing holding MHonArc back from being able
to produce RDF files.

Jeff



Re: rcfile confusion?

1999-09-27 Thread Jeff Breidenbach


>It looks like the bug fix is much easier than I was expecting.
>I have attached patch to mhindex.pl
>(SCCS ID: mhindex.pl 1.4 99/06/25 14:21:22), that hopefully fixes
>the problem.


Brilliant - I applied the patch and it works great.
Thanks, Earl!

Jeff



Re: higher memory requirements in 2.4.0 ? [isolated]

1999-06-29 Thread Jeff Breidenbach


>Note, my system configuration is different from yours.  I am running
>Perl 5.005_03 on RH 5.2 w/2.2.9 kernel.  There is a possibility that
>Perl 5.004 has some memory leaks exposed by MHonArc v2.4.0.

The perl version that is recommended in RedHat 5.2's errata notes is:

# rpm -q perl MHonArc
perl-5.004m7-1

Running under this perl, all memory is consumed every time I try
asking MHonArc 2.4.0 to add to the big database. So, I tried upgrading
perl. Scrounging the net for a slightly newer version of perl, I found
this one in ftp://contrib.redhat.com:

# rpm -q perl
perl-5.005_02-1

# perl -v

This is perl, version 5.005_02 built for i386-linux-thread 
[...]

MHonArc 2.4.0 worked fine under this perl and contained itself to
about 143 meg. (More or less, I get the number from watching top
during the run.)  Switching back to the old version of perl caused the
problem to reappear.  I'm not going to draw any broad sweeping
conclusions from this experiment, but it's safe to say I will be
sticking with the newer version of perl.

Jeff



higher memory requirements in 2.4.0 ?

1999-06-26 Thread Jeff Breidenbach


I've been putting 2.4.0 through some paces.

Upgrading to 2.4.0 caused out of memory errors when adding to a 60,000
document archive. In version 2.3.3, similar operations only required
about 130 meg, or half of available memory. Running out memory is
serious, because the lack of memory effectively stops the kernel from
starting new processes (causing commands like 'ps' to
coredump). This makes the machine unusable until MHonArc finally exits
with return code 137. Smaller archives appear to work correctly.

No changes were made to the (complicated) rcfile, except the removal
of a timezone resource. No changes were made to the (complicated)
command line arguments.

I'd be happy to supply the core file, database file, rcfile, command
line options, run profiling or debugging tools (need instructions) or
provide remote access to a machine that demonstrates the
problem. Here's some system information.

Jeff

---

# mhonarc -V
  MHonArc v2.4.0 (Perl 5.00405)
  Copyright (C) 1995-1999  Earl Hood, [EMAIL PROTECTED]
  MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only
  under the terms of the GNU General Public License, which may be found in
  the MHonArc distribution.

# rpm -q perl MHonArc
perl-5.004m7-1
MHonArc-2.4.0-1

# uname -a
Linux marmot.jab.org 2.0.36 #1 Tue Dec 29 13:11:13 EST 1998 i586 unknown

# free
 total   used   free sharedbuffers cached
Mem:257048  41856 215192  16804  14348  14736
-/+ buffers/cache:  12772 244276
Swap:72256   7832  64424


>From system logs, MHonArc output is reported as 'mailme'. The interpreter
messages are from me trying to run programs as root and then as a
regular user during the problem time.

Jun 26 20:57:54 marmot mailme: Warning: Database (2.3.3) != program (2.4.0) version. 
Jun 26 20:59:27 marmot mailme: Out of memory! 
Jun 26 21:00:20 marmot kernel: Unable to load interpreter
Jun 26 21:00:52 marmot kernel: Unable to load interpreter
Jun 26 21:01:58 marmot last message repeated 3 times
Jun 26 21:02:41 marmot last message repeated 4 times
Jun 26 21:03:20 marmot last message repeated 3 times
Jun 26 21:03:21 marmot PAM_pwdb[18495]: (su) session closed for user root
Jun 26 21:03:24 marmot kernel: Unable to load interpreter
Jun 26 21:03:36 marmot last message repeated 2 times
Jun 26 21:04:09 marmot mailme: MHonArc returned exit code 137 for 
[EMAIL PROTECTED]





language detection

1999-06-25 Thread Jeff Breidenbach


I was thinking about automatic language detection. If mailing
list traffic was predominantly Icelandic, I would like to automaticly
ask MHonArc switch over to a resource file localized for Icelandic.

Being completely naive, I pulled up a few non-English emails and
looked for some line in the headers that identified the language. How
incredibly depressing. The only relevant headers I found were the
character set, which appears common for dozens of langages. The only
other header clue was the domain of the list server, which is hardly a
sure thing, given the pervasiveness of both the English language and
the .com domain name. What do people do for automatic language
detection for email? Are they stuck with scanning the body for common
dictionary words?  Bleah!!

So the question is:

 a) Am I missing something obvious

 b) Are there any languages that are easily detected
(perhaps by a unqiue character set?) If so, are 
those languages supported by MHonArc? Oh, and what
are they? 

I guess I'll have to scuttle the whole thing; if so that's too bad,
since I really think it would be great to automatically customize
to a particular language.

Jeff

PS Typical non-English language email headers appended.



Return-Path: [EMAIL PROTECTED]
Delivery-Date: Tue May 25 07:50:36 1999
Return-Path: <[EMAIL PROTECTED]>
Received: from jab.org (u251.varesearch.com [209.81.8.251])
by marmot.jab.org (8.8.7/8.8.7) with ESMTP id HAA28750
for <[EMAIL PROTECTED]>; Tue, 25 May 1999 07:50:35 -0700
Received: from mars.mmedia.is (mars.mmedia.is [193.4.192.20])
by jab.org (8.8.7/8.8.7) with ESMTP id KAA16373
for ; Tue, 25 May 1999 10:48:37 -0400
Received: (from mail@localhost)
by mars.mmedia.is (8.9.0/8.9.0-MMEDIA) id AAA03873
for kde-isl-list; Tue, 25 May 1999 00:23:36 GMT
Received: from mailer.isholf.is (pop.isholf.is [194.105.226.2])
by mars.mmedia.is (8.9.0/8.9.0-MMEDIA) with ESMTP id AAA03857
for <[EMAIL PROTECTED]>; Tue, 25 May 1999 00:23:31 GMT
Received: from [157.157.168.204] by mailer.isholf.is (NTMail
4.20.0009/NU2631.00.d894e447) with ESMTP id kgkacaaa for 
<[EMAIL PROTECTED]>; Tue, 25 May 1999 15:43:15 +
Message-ID: <[EMAIL PROTECTED]>
Date: Tue, 25 May 1999 15:41:44 +
From: Jn Gumundsson <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
X-Mailer: Mozilla 3.04 (Win95; I)
MIME-Version: 1.0
To: [EMAIL PROTECTED]
Subject: [kde-isl]: Forritun fyrir KDE Hvar eru grunnkarnir!!
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: [EMAIL PROTECTED]
Precedence: normal
Organization: Skgrkt R

Einarsson [...]



successfully processed 10e6 emails

1999-06-24 Thread Jeff Breidenbach


MHonArc just processed its one millionth email on my computer.
As you can imagine, I'm extremely pleased. What great software!

It's been a lot fun scaling up. Here's what I learned from my
experience over the last year and a half, from a technical standpoint.
The system is a single PC with an AMD K6-II processor, 256 megs of
ram, and two 16 gig IBM IDE disks.


MHonArc:

  * MHonArc, in batch mode, provides O(n) performance no
matter how big an archive gets (tested to 60k)

  * The risk of an orphaned lock file is too great. It was better to
use -nolock and manage concurrency myself.
 
  * It was better to buy RAM than to use -savemem

  * Once in a blue moon, perl processes can crash and core dump.
No big deal if you remember to check process return values.

  * htdig makes for an excellent search engine for MHonArc pages
  

Stock redhat linux 5.2:

  * The default open files limit (1000) is too low.

  * Mounting a hard disk takes minutes, e2fsck can take an hour, and 
ls can take quite a few seconds.

  * IDE disk thoughput increased when I tweaked settings with hdparm 

  * When you do a lot of writing to system logs, syslogd starts
hogging 25% of the processor. Rotating logfiles daily fixes
this problem.

  * Better to put some limits on 'updated' 

  * People can break into a stock system (due to security holes
in software bundled with the OS)


Other:

  * There are certain emails which can kill nmh.



---

 * My friend Brian Semmes was a math major. For some reason he
took the introductory electrical engineering class, and it wasn't
pretty.  The professor said things like, "This resister has 10^6 ohms;
heck, 10^6 is practically inifinity, so we'll just substitute infinity
into this equation..." It drove him crazy, and I think of him
whenever I hear the word "million". 







along with the spam proofing

1999-06-15 Thread Jeff Breidenbach


http://validator.w3.org doesn't give MHonArc message pages a thumbs up
because the first line has to be a DOCTYPE declaration, such as:

   

I think the comments MHonArc places at the top of the page are invalid
for any version of HTML above 2.0. This has no practical significance,
and I understand that a fixed position at the top of the file is
helpful for machine readability. However, since the format of these
comments are rumored to be changing (due to spam shielding) in the
future versions of MHonArc, this might be a window of opportunity.

Jeff



Anti-Spam Measures (RFC)

1999-06-12 Thread Jeff Breidenbach



> My preference for the archives I maintain would be to 
> have a hook in mhonarc that would allow me to apply
> my own address-modifying subroutine.  Rot13 is probably
> sufficient to stop the harvesters, but doesn't provide
> real privacy.

Are you thinking of hooks for strong encryption in place of
rot13? Perhaps running the list through an anonymizing remailer
would be just as effective.

> I would also like to leave the message bodies in my archives
> untouched.

I would, too.

By the way, I benchmarked MHonArc the other day as it rebuilt a 60,000
message archive from raw email, 870 messages at a time. It took about
a day on a K6-2 300 with 256 MB of ram.

Interestingly, each chunk of 870 messages took about the same amount
of time to run, so in this mode of operation I'd say time requirements
are O(n) where n is the number of messages.

Jeff



[PATCH] Address kerflundering

1999-06-11 Thread Jeff Breidenbach



I recieved the following patch to MHonArc after offering a small
bounty at the Free Software Bazaar. Overall, I got two offers to do
the patch within a week of the offer being posted. I'm impressed. This
patch was written by Alexis Mikhailov.

Jeff



Alexis Mikhailov <[EMAIL PROTECTED]> writes:

>Hello Jeff!
>
>Here is a patch against version 2.3.3 of MHonArc. I've checked it to some
>extent.
>
>Alexis


diff -ru MHonArc2.3.3/lib/mhamain.pl MH/lib/mhamain.pl
--- MHonArc2.3.3/lib/mhamain.pl Sun Nov  8 21:06:23 1998
+++ MH/lib/mhamain.pl   Fri Jun 11 17:18:44 1999
@@ -238,7 +238,8 @@
 
 ## Get here, we are processing mail folders
 
-local($mesg, $tmp, $index, $sub, $from, $i, $date, $fh);
+local($mesg, $tmp, $index, $sub, $from, $i, $date, $fh ,$fromaddrname,
+   $fromaddrdomain);
 local(%fields);
 
 $i = $NumOfMsgs;
@@ -255,7 +256,7 @@
$handle = $ADD;
 
## Read mail head
-   ($index,$from,$date,$sub,$header) =
+   ($index,$from,$date,$sub,$header,$fromaddrname,$fromaddrdomain) =
&read_mail_header($handle, *mesg, *fields);
 
if ($index ne '') {
@@ -303,7 +304,7 @@
}
print STDOUT "."  unless $QUIET;
$mesg = '';
-   ($index,$from,$date,$sub,$header) =
+   ($index,$from,$date,$sub,$header,$fromaddrname,$fromaddrdomain) =
&read_mail_header($fh, *mesg, *fields);
 
#  Process message if valid
@@ -347,7 +348,7 @@
MBOX: while (!eof($fh)) {
print STDOUT "."  unless $QUIET;
$mesg = '';
-   ($index,$from,$date,$sub,$header) =
+   ($index,$from,$date,$sub,$header,$fromaddrname,$fromaddrdomain) =
&read_mail_header($fh, *mesg, *fields);
 
if ($index ne '') {
@@ -667,6 +668,23 @@
 print STDOUT "\n"  unless $QUIET;
 }
 
+sub split_address {
+local($from) = @_;
+local($fromaddrname, $fromaddrdomain);
+local(@machines);
+
+$from =~ s/^.*\<(.*)\>.*$/$1/;
+$from =~ s/\(.*\)//;
+$from =~ s/^\s+//;
+$from =~ s/\s+$//;
+
+@machines = split /\!/, $from;
+if ($machines[-1] =~ /[@%]/)
+{
+   return split /[@%]/, $machines[-1];
+}
+return ($machines[-1], $machines[-2]);
+}
 ##---
 ## read_mail_header() is responsible for parsing the header of
 ## a mail message.
@@ -674,7 +692,7 @@
 sub read_mail_header {
 local($handle, *mesg, *fields) = @_;
 my(%l2o, $header, $index, $date, $tmp, @refs, @array);
-local($from, $sub, $msgid);
+local($from, $sub, $msgid, $fromaddrname, $fromaddrdomain);
 local($_);
 
 $header = &readmail::MAILread_file_header($handle, *fields, *l2o);
@@ -759,6 +777,7 @@
 foreach (@FromFields) {
next  unless $fields{$_};
$from = $fields{$_};
+   ($fromaddrname, $fromaddrdomain) = split_address($from);
last;
 }
 $from = 'No Author'  unless $from;
@@ -802,7 +821,7 @@
 &remove_dups(*refs);# Remove duplicate msg-ids
 $Refs{$index} = join($X, @refs)  if (@refs);
 
-($index,$from,$date,$sub,$header);
+($index,$from,$date,$sub,$header,$fromaddrname,$fromaddrdomain);
 }
 
 ##---
diff -ru MHonArc2.3.3/lib/mhrcvars.pl MH/lib/mhrcvars.pl
--- MHonArc2.3.3/lib/mhrcvars.plSun Nov  8 21:06:23 1998
+++ MH/lib/mhrcvars.pl  Fri Jun 11 17:12:21 1999
@@ -139,13 +139,17 @@
"";
last REPLACESW;
}
-   my($cnd1, $cnd2, $cnd3) = (0,0,0);
+   my($cnd1, $cnd2, $cnd3, $cnd4, $cnd5) = (0,0,0,0,0);
if (($cnd1 = ($var eq 'FROM')) ||   ## Message "From:"
($cnd2 = ($var eq 'FROMADDR')) ||   ## Message from mail address
-   ($cnd3 = ($var eq 'FROMNAME'))) {   ## Message from name
+   ($cnd3 = ($var eq 'FROMNAME')) ||   ## Message from name
+   ($cnd4 = ($var eq 'FROMADDRNAME')) || ## Message from user name
+   ($cnd5 = ($var eq 'FROMADDRDOMAIN'))) { ## Message from domain
my $esub = $cnd1 ? sub { $_[0]; } :
   $cnd2 ? \&extract_email_address :
-  \&extract_email_name;
+  $cnd3 ? \&extract_email_name :
+  $cnd4 ? \&extract_email_addr_name :
+  \&extract_email_addr_domain;
$canclip = 1; $raw = 1;
($lref, $key, $pos) = compute_msg_pos($index, $var, $arg);
$tmp = defined($key) ? &$esub($From{$key}) : "(nil)";
diff -ru MHonArc2.3.3/lib/mhutil.pl MH/lib/mhutil.pl
--- MHonArc2.3.3/lib/mhutil.pl  Sat Oct  3 23:07:54 1998
+++ MH/lib/mhutil.plFri Jun 11 17:12:42 1999
@@ -44,6 +44,28 @@
 $ret;
 }
 
+sub extract_email_addr_name 

address kerflundering

1999-06-04 Thread Jeff Breidenbach


I decided to try something a bit unusual, and submit an offer for a
small bounty at "The Free Software Bazaar" for a patch to MHonArc.
Mainly, I'm curious to see if and how their bounty system
works. Anyway, I thought it would be common courtesy to carbon copy to
this list. I hope I didn't offend anyone with this experiment.

Cheers,
Jeff

--

MHonArc is a popular GPL'd email to HTML converter written in Perl.  I
want a patch to add two new resources variables to MHonArc.  Patch
must follow guidelines below. Patch must be created with 'diff -uNr',
be shorter than 100 lines and apply cleanly to MHonArc 2.3.3 or
later. Patch may not destroy any existing functionality in MHonArc.
Final condition: submit patch to MHonArc mailing list.

Offer expires midnight, December 31, 1999, GMT.

Helpful references:
 http://www.mail-archive.com/mhonarc@ncsa.uiuc.edu/msg01047.html
 http://www.oac.uci.edu/indiv/ehood/mhonarc.html 

$25 to developer.

Jeff Breidenbach
[EMAIL PROTECTED]
phone: 908 210 9135 home
phone: 908 938 9600 x3010 work
http://www.jab.org (homepage)

=
The following advice is quoted from Earl Hood:
=

The two main functions to target are:

mailUrl() in mhutil.pl:
This function is used in the conversion of address in converted
message headers.

replace_li_var() in mhrcvars.pl:
This is the general purpose function for doing resource
variable interpolation.  If any new resource variables are desired,
this function would have to handle them.

What I propose is the following new resource variables:

$FROMADDRNAME$
The username portion of the email address

$FROMADDRDOMAIN$
The domain name of the email address

Example:

[EMAIL PROTECTED]
$FROMADDRNAME$   => "nobody"
$FROMADDRDOMAIN$ => "foo.com"

-ewh




rcfile - passing arguments to

1999-02-27 Thread Jeff Breidenbach


Check out the -title and -ttitle command line options.

Cheers,
Jeff


>  I am using a resorce file and I would like to be
> able to pass the  to it. I'd like to specify the listname at
> command line and then use it in the rcfile - is it possible to do this ?



delmsg + MAXSIZE windowing

1999-01-05 Thread Jeff Breidenbach


Hi all,

I commented out the following lines in delmsg in mhamain.pl, then
added a MAXSIZE of 3000 to the rcfile of an archive which had about
1 messages. The goal is a windowing effect, where a small MHonArc
database runs a big archive containing lots and lots of HTML message
files.

The results were unusual. First, the time required to add new messages
dropped from 20 minutes to less than three minutes. (GREAT!)  On the
flip side, the MHonArc indexes did not work as hoped.  The initial
thread index page showed no change (and continues to not change, even
as more messages are added) The first date index page now shows a
rather old set of messages, and also does not appear to update as new
messages are added.

Did I make an obvious mistake?

Thank you,
Jeff

PS This is obviously an unsupported topic and I don't want to waste
people's time - so please send me on my way if this topic is too
esoteric.

--

#&file_remove($filename);
#foreach $filename (split(/$X/o, $Derived{$key})) {
#   $pathname = (&OSis_absolute_path($filename)) ?
#   $filename :
#   join($DIRSEP, $OUTDIR, $filename);
#   if (-d $pathname) {
#   &dir_remove($pathname);
#   } else {
#   &file_remove($pathname);
#   }
#}



% mhonarc -v
  MHonArc v2.3.3 (Perl 5.00404)
  Copyright (C) 1995-1998  Earl Hood, [EMAIL PROTECTED]
  MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only
  under the terms of the GNU General Public License, which may be found in
  the MHonArc distribution.

The full rcfile is rather long... so I'm just including the bits that
seem most relevant.




1



300


x-archive-with-date:received:date


3000


Relevant command line options were

-add -nolock -savemem -quiet -rcfile rcfile -tidxfname index.html





MAXSIZE - poor man's windowing

1999-01-05 Thread Jeff Breidenbach


So here's a crazy high level idea about how to implement windowing
(i.e. having MHonArc only consider recent parts of the archive rather
than the whole thing when indexing new messages.) What do people
think?  Here's the scenario:

---





3


100



So I start adding messages to the archive, and it grows and
grows... at 101 messages we have two date index pages (due to IDXSIZE
and MULTIPG).  At 201 we get three date index pages. So far everything
is normal.

However, when we get to message 301, it gets more interesting. The
database shrinks to size 200, (IDXSIZE * (WINDOW - 1)). The shrinkage
is like a MAXSIZE shrinkage, however the existing html message files
do not get deleted, nor does the index file we just orphaned. Both
stick around and are legacy html files; i.e. perfectly good HTML files
that have no representation in the MHonArc database.

Then as messages get added to the archive, eventually the database
size reaches > (WINDOW * IDXSIZE) and we repeat the process.

-

Here are the advantages/disadvantages I thought of; I'm sure there are
others.

Advantages
--

* MHonArc would have the ability to handle large archives with a small
  database (saving on both memory and processing time)

* It might be sane to implement

Disadvanatages
--

* It's not a perfectly generalized solution (i.e. your window size
  is quantized by IDXSIZE)

* It may not offer enough advantages over stright MAXSIZE to be worth
  adding complexity to the code.



Re: MAXSIZE - poor man's windowing

1998-12-30 Thread Jeff Breidenbach


>You can deal with the confusion of a half-indexed corpus of messages,
>because you have your hands on the constructruction of the site and know
>its structure inside out.  Is there anybody else who needs to access the
>site?  Having indices that work for some messages and not for others sounds
>like a gold plated way to convince would-be users that the resource is
>broken and unusable.  Just a thought.

The site is http://www.mail-archive.com and has quite a few users.  I
suspect nobody would notice if I pruned the MHonArc indexes to the
5000 most recent messages.  Given the current user interface, do you
think I will alienate users?

Jeff



MAXSIZE - poor man's windowing

1998-12-29 Thread Jeff Breidenbach


As we know, it can take a fair amount of time to add messages to
larger archives. I've seen filing times upwards of 20 minutes.
For me, long filing times are starting to become a bottleneck.

The usual solution for MHonArc is to split archives up, for example,
putting each month in a separate directory. Another possibility is to
use MAXSIZE to keep the archives from growing too large.

However, I'm thinking of another possbibility. With MAXSIZE, new
messages are added, and, if necessary old ones are deleted.  However,
if a MAXSIZE variant were to only delete entries in the the database,
and not erase the html files, we'd still get the speed advantages of a
small database. It would also allow me to keep the old message pages
around.

Of course, there would not be any index pages for those old message
pages.  In my case that's ok, since I use a search engine find old
messages anyway.

Any comments or thoughts?

Jeff






Re: how to make gifs not inline?

1998-12-05 Thread Jeff Breidenbach


Earl,

I see it now - there was a Content-Disposition in the headers after
all. I checked four times in the past, but I was stupidly looking at
the message headers as opposed to the MIME headers. (I don't know what
came over me!)

Thank you for the suggestion about MIMEargs settings, and I'm very
sorry for wasting your time on a false question.

Jeff



how to make gifs not inline?

1998-12-04 Thread Jeff Breidenbach


Looking at the docs, it appears image inlining is instigated by the
MIMEARGS defaults. I tried to override that for a mail with an
attached .gif and no Content-Disposition: header. It still got
inlined. What am I missing?

Thank  you in advance,
Jeff 
--

MHonArc v2.3.3 (Perl 5.00404)


m2h_external::filter; usename useicon subdir iconurl="../attachment.gif"
image/gif;




Time Zones (RFC)

1998-11-30 Thread Jeff Breidenbach


The Aventists' list filled in most of the blanks. I did not look for
discrepencies, but noticed some anyway in the New Zealand timezones.
Don't know which list is correct.

Jeff

---

%Zone = (
'ACDT', '-1030',# Australian Central Daylight
'ACST', '-0930',# Australian Central Standard
'ADT',   '0300',# (US) Atlantic Daylight
'AEDT', '-1100',# Australian East Daylight
'AEST', '-1000',# Australian East Standard
'AHDT',  '0900',
'AHST',  '1000',
'AST',   '0400',# (US) Atlantic Standard
'AT','0200',# Azores
'AWDT', '-0900',# Australian West Daylight
'AWST', '-0800',# Australian West Standard
'BAT',  '-0300',# Bhagdad
'BDST', '-0200',# British Double Summer
'BET',   '1100',# Bering Standard
'BST',   '0300',# Brazil Standard
#   'BST',  '-0100',# British Summer
'BT',   '-0300',# Baghdad
'BZT2',  '0300',# Brazil Zone 2
'CADT', '-1030',# Central Australian Daylight
'CAST', '-0930',# Central Australian Standard
'CAT''1000',# Central Alaska
'CCT',  '-0800',# China Coast
'CDT',   '0500',# (US) Central Daylight
'CED',  '-0200',# Central European Daylight
'CET',  '-0100',# Central European
'CST',   '0600',# (US) Central Standard
'EAST', '-1000',# Eastern Australian Standard
'EDT',   '0400',# (US) Eastern Daylight
'EED',  '-0300',# Eastern European Daylight
'EET',  '-0200',# Eastern Europe
'EEST', '-0300',# Eastern Europe Summer
'EST',   '0500',# (US) Eastern Standard
'FST',  '-0200',# French Summer
'FWT',  '-0100',# French Winter
'GMT',   '',# Greenwich Mean
'GST',  '-1000',# Guam Standard
#   'GST',   '0300',# Greenland Standard
'HDT',   '0900',# Hawaii Daylight
'HST',   '1000',# Hawaii Standard
'IDLE', '-1200',# Internation Date Line East
'IDLW',  '1200',# Internation Date Line West
'IST',  '-0530',# Indian Standard
'IT',   '-0330',# Iran
'JST',  '-0900',# Japan Standard
'JT',   '-0700',# Java
'MDT',   '0600',# (US) Mountain Daylight
'MED',  '-0200',# Middle European Daylight
'MET',  '-0100',# Middle European
'MEST', '-0200',# Middle European Summer
'MEWT', '-0100',# Middle European Winter
'MST',   '0700',# (US) Mountain Standard
'MT',   '-0800',# Moluccas
'NDT',   '0230',# Newfoundland Daylight
'NFT',   '0330',# Newfoundland
'NT','1100',# Nome
'NST',  '-0630',# North Sumatra
#   'NST',   '0330',# Newfoundland Standard
'NZ',   '-1100',# New Zealand 
'NZST', '-1200',# New Zealand Standard  #-1300 NEW ZEALAND STD. SUMMER
'NZDT', '-1300',# New Zealand Daylight  
'NZT',  '-1200',# New Zealand   #NEW ZEALAND STD.
'PDT',   '0700',# (US) Pacific Daylight
'PST',   '0800',# (US) Pacific Standard
'ROK',  '-0900',# Republic of Korea
'SAD',  '-1000',# South Australia Daylight
'SAST', '-0900',# South Australia Standard
'SAT',  '-0900',# South Australia Standard
'SDT',  '-1000',# South Australia Daylight
'SST',  '-0200',# Swedish Summer
'SWT',  '-0100',# Swedish Winter
'USZ3', '-0400',# USSR Zone 3
'USZ4', '-0500',# USSR Zone 4
'USZ5', '-0600',# USSR Zone 5
'USZ6', '-0700',# USSR Zone 6
'UT','',# Universal Coordinated
'UTC',   '',# Universal Coordinated
'UZ10', '-1100',# USSR Zone 10
'WAT',   '0100',# West Africa
'WET',   '',# West European
'WST',  '-0800',# West Australian Standard
'YDT',   '0800',# Yukon Daylight
'YST',   '0900',# Yukon Standard
'ZP4',  '-0400',# USSR Zone 3
'ZP5',  '-0500',# USSR Zone 4
'ZP6',  '-0600',# USSR Zone 5
);



rcfiles cascade - wow!

1998-11-29 Thread Jeff Breidenbach


Hi all,

Did I mention I was impressed by MHonArc? I just
checked to see if rcfiles would cascade, i.e.

mhonarc -rcfile a -rcfile b mbox

It appears to work just like cacading stylesheets; i.e rcfile a is
used, except where augmented or overridden by rcfile b. I am totally,
totally impressed. Wow. Sorry for cluttering the list with praise, but
that just knocked my socks off.

Jeff



Time Zones (RFC)

1998-11-29 Thread Jeff Breidenbach


I recommend mentioning your philosphy with respect to your proposed
changes. Possible philosophies could be:

1) We support the mininum number of timezones required
   by RFC 822 by default.

2) We support a subset of popular timezones, in order
   to cover many common cases, by default.

3) We support all timezones listed in official 
   standard X by default.

a) We are extensible in full hour increments.
b) We are extenisible in minute increments
c) We extensibly deal with acronym name sapce collisions
d) We extensibly deal with the historical changes of
   timezones; i.e.  mail sent from Libya in 1913 will be time zone
   corrected differently than mail sent from Libya in 1953

Quite frankly, getting it "right" requires a huge historical lookup
table, timezone offsets to the second, and other nightmares.  The
governemnt timezone documents I was looking at (referenced from
http://www.bsdi.com/date) really were mind numbingly complex and
required constant revision.

I personally think philosphy 2a is quite adequate. I suspect nobody
would ever take advantage of philosphy 2b, (which appears to be what
you are suggesting) but have no fundamental opposition. For what it's
worth, I was getting maybe 3 emails out of 1000 with an unrecognized
timezone offset, and they were almost invariably MET and AEST. I think
the more obscure the timezone, the more likely we would get a
numerical offset in the email.

Jeff





TIMEZONE defaults

1998-11-20 Thread Jeff Breidenbach


I looked for official sounding timezone code and documents at
http://www.bsdi.com/date, and found it incomprehensible. Instead I
just used the list from the Adventists (mentioned earlier)

Java 1.2 has a method java.util.TimeZone.getAvailableIDs() which may
be a good source. Java tends to be pretty uptight about following
standards for this sort of thing. (And if you run this on a recent
Solaris maybe you'll at least get a POSIX list) Of course Java 1.2
final isn't out yet and I can't find out an more from the
documentation.

Anyway, at least now my logs will be a little less cluttered (AEST and
MET were the most common offenders.)

Jeff



AHDT:9
AHST:10
AST:4
ACDT:-10
ACST:-9
AEDT:-11
AEST:-10
AWDT:-9
AWST:-8
AT:2
BAT:-3
BET:11
BZT2:3
BDST:-2
BST:-1
CDT:5
CED:-2
CET:-1
CST:6
CCT:-8
EDT:4
EED:-3
EET:-2
EST:5
GMT:0
GST:-10
HDT:9
HST:10
IST:-5
IDLE:-12
IDLW:12
IT:-3
JST:-9
JT:-7
MED:-2
MET:-1
MT:-8
MDT:6
MST:7
NZST:-13
NZT:-12
NZS:-12
NZ:-11
NT:11
NST:-6
PDT:7
PST:8
SAD:-10
SDT:-10
SAST:-9
SAT:-9
SST:-7
UZ10:-11
USZ3:-4
USZ4:-5
USZ5:-6
USZ6:-7
UT:0
WAT:1
YDT:8
YST:9





TIMEZONE defaults

1998-11-18 Thread Jeff Breidenbach


Just curious; why are the default recognized 
not more comprehensive? 

Are timezone acronyms not standardized?
Do RFC's only recommend knowing a few timeszones?
Or is this a potential area for improvemnt?

Jeff

PS I didn't find an RFC or ISO standard during five miniutes of poking
around, but did find some informal timezone listings at:

http://sonne.net/Vicious/time.html
http://news.adventist.org/sun/



2.3.3 RPM users : please update

1998-11-11 Thread Jeff Breidenbach


The 2.3.3 RPM you *really* want is MHonArc-2.3.3-2.noarch.rpm or
later.  Don't settle for less, or you'll run into a path glitch I made
during packaging.

Sorry,
Jeff



web based email services

1998-11-10 Thread Jeff Breidenbach


I think there are a few such systems kicking about. www.freshmeat.com
has a few listed (look under appindex on their site) I think one
is called "atdot" or something similar.

Anyway, I haven't used any of these myself and do not know
if any utilize MHonArc.

Jeff



uploaded RPM is really v2.3.3

1998-11-09 Thread Jeff Breidenbach


Sorry, I made a typo; the RPM I uploaded is not 2.3.2;
it is the latest 2.3.3. Expect to see it at ftp://contrib.redhat.com/noarch
in a few days.

Jeff



v2.3.2 RPM uploaded

1998-11-09 Thread Jeff Breidenbach


Presumeably it will be available in a few days from
ftp://contrib.redhat.com/noarch

The RPM itself was revised (simplified) to take advantage of MHonArc's
improved install script.

Jeff



Re: Anyway to use include files (for nav bars) in a RCfile?

1998-11-04 Thread Jeff Breidenbach


>(BTW, the search engine on the mhonarc list on mail-archive.com is
>currently down, so I tried to search the archive before asking this!)

I upgraded mail-archive.com today, and it took a several hours for the
new search engine to re-index everything. Anyway, things are back up
and running; you might even notice slight improvements in performance
when searching.

Also, for what it is worth, the HTML 4.0 specification (www.w3.org)
has a tag for "include this bit of HTML from a file right in here" I
forget the tag name. The upside is it's exactly what you want; the
downside is I don't know of any browser that implements it.  Perhaps
it's a good choice if you want to be, ahem, ahead of the curve.

Jeff







Re: Return code 139 (bug?)

1998-10-27 Thread Jeff Breidenbach


>BTW, could you compress some of the data you put up on your FTP
>site?  The .mhonarc.db file is huge, and compressing it will make
>download muck quicker.

Done. ftp://jab.org/db.tgz (It's still pretty huge.)

I will attempt the other diagnostics you suggested. I'd like to do
some more observation on the 2.3.0 archives as well, to get a better
characterization on whether the problem is deterministic or not. 

Jeff

PS You may want to wait until I've done another analysis before
cracking open the DB file... the preliminary results for 2.3.0
are promising.



Return code 139 (bug?)

1998-10-22 Thread Jeff Breidenbach


Hi all,

Here's an update to the error code 139 incident described earlier.
(like you care!, but hey...)

After digging around more on the return code 139, it again it appears
to be database corruption, with a missing apostrophe in the DB file.

I don't know why two of my databases periodicly get corrupted.  In
high hopes, I just have upgraded from 2.2.0 -> 2.3.0, and am now using
the famous -nolock feature. We'll see what luck I have.

Jeff



2.3.0 RPM uploaded

1998-10-22 Thread Jeff Breidenbach


FYI,

A RedHat linux RPM has been uploaded and should show up in the next
few days. No changes were made to the RPM except a version update from
2.2.0 to 2.3.0

ftp://ftp.redhat.com/pub/crontib/noarch

Jeff



Return code 139 (bug?)

1998-10-20 Thread Jeff Breidenbach


Earl et all,

I have been getting getting problems every few days or less on two of
the 125 lists I have identically configured with MHonArc. These two
lists are high traffic lists (other high traffic lists are doing ok,
so far). From today's logs of stderr and stdout, we have about
twenty successful runs, followed by a failure, which leaves a
.mhonarc.lck lying around.

Shown are portions of the last successful run, and the first
failure. Subsequent failures are due to the .mhonarc.lck file lying
around and have an exit code of 255.  The comments about return values
are from my wrapper script.

I have placed a copy of the DB at ftp://jab.org/pub/.mhonarc.db
I have placed a copy of the rcfile at ftp://jab.org/pub/rcfile
I have placed the affected mail at ftp://jab.org/pub/catastrophe/
  noting that some of the error code 255 mail also get diverted
  to this directory.

The call to MhonArc is a shell script, the relevant lines are
attached. You may assume all the shell variables are assigned to
reasonable values.

MHONARC=/usr/bin/mhonarc
FLAGS="-reverse -treverse -tidxfname index.html"
FLAGS="$FLAGS -rcfile $HOME/rcfile -savemem -tlevels 1"
FLAGS="$FLAGS -ttitle $ESCAPED_NAME -title $NICKNAME"
FLAGS="$FLAGS -idxsize 300 -multipg -add -nomailto"
$MHONARC $FLAGS $HOME/Mail/$FILTER

At no point are two MHonArc processes called at the same time.
Suggestions? Comments on this particular return code? 

Jeff



Reading database ...
Reading resource file: /home/archive/rcfile ...
Adding messages to .
Reading /home/archive/Mail/filter.1998.10.20-11:10:33-27409 ..
Writing mail 
Writing ./thrd81.html ...
Writing ./thrd82.html ...
Writing ./thrd83.html ...
Writing database ...
24804 messages
Successfully ran Mhonarc  

Now creating HTML archives.
Reading database ...
Reading resource file: /home/archive/rcfile ...
Adding messages to .
Reading /home/archive/Mail/filter.1998.10.20-11:10:01-2601 ..ERROR: MhonArc 
returned exit code 139.

[jeff@multivac jeff]# mhonarc -v
  MHonArc v2.2.0
  Copyright (C) 1995-1998  Earl Hood, [EMAIL PROTECTED]
  MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only
  under the terms of the GNU General Public License, which may be found in
  the MHonArc distribution.

[jeff@multivac jeff]# uname -a
Linux multivac.jab.org 2.0.32 #1 Wed Nov 19 00:46:45 EST 1997 i586 unknown

[jeff@multivac jeff]# perl -v
This is perl, version 5.004_04 built for i386-linux

Copyright 1987-1997, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.

[jeff@multivac jeff]# rpm -q perl
perl-5.004-4

[jeff@multivac Mail]# free
 total   used   free sharedbuffers cached
Mem:126788 124096   2692  29420  57324  27432
-/+ buffers/cache:  39340  87448
Swap:16092116  15976



Re: stale lockfile

1998-09-25 Thread Jeff Breidenbach


Regarding the stale lockfile - good news. I finally managed to capture
the error message (stderr is being logged...now) which was indicative
of database corruption. Perhaops this wasn't as intermittant as I
thought. Here's the message (MHonArc v2.2):

Reading database
Can't find string terminator "'" anywhere before \
EOF at ./.mhonarc.db line 59552.

Thank you the tip about -savemem not being helpful for archiving
single messages.  My setup uses a daemon which sorts and processes a
inbox queue every so often. It's surprisingly easy to pull off with
the MH commands. In times of light traffic, it does one message at a
time. During heavy traffic, they pile up and it is much more of a
batch job. This seems to work well performance wise.

As for fcntl() and friends, I personally am planning to move to a
distributed filesystem in the medium/near future. I've been very
impressed with Coda, a free and much improved descendent of AFS, and
suspect it will grow quite popular with time. Thus I might prefer to
use the -nolock feature in the future, and trust my wrapper script to
keep things from happening concurrently.

Jeff



stale lockfile

1998-09-24 Thread Jeff Breidenbach


Hello,

I am finding that MHonArc is intermittantly leaving a lockfile around,
suggesting abnormal termination. I manually delete the file, and about
a day or two later it happens again. And again. 

The weird part is, I am archiving 100 identically configured lists,
and the problem consistantly (although intermittantly every few days)
only occurs for one list.

The list isn't my biggest, but it does have the high traffic (about 20
messages a day) and just shy of 20,000 messages archived. Larger list
archives (albeit with lower traffic these days) seem to be doing
fine. Only one MHonArc runs at a time, and I believe I have enough
system resources. I use -savemem and -add, along with a lot of other
customizations.

Could I have a corrupt database? Do I need to rebuild it?
Any thoughts on trouble shooting? I haven't caught this behavior
during a manual run yet.

Jeff



Re: attachment names and "Message Not Available"

1998-09-16 Thread Jeff Breidenbach


Hi Earl,

Wow! Thank you. The suggestions worked great!  The emacsesque
extensibility of MHonArc continues to impress me.

Jeff



attachment names and "Message Not Available"

1998-09-15 Thread Jeff Breidenbach


Hi all,

Guess I'm on a roll (rut?) with MHonArc suggestions; I'm feeling
guilty not having contributed any code at all, yet making all sorts of
suggestions. 

Here goes with two more. Do these make sense? (Note I looked through
the archive and didn't see discussion on either of these, but may have
missed it.)

Jeff

(1)
It would be nice to be able to stifle "Message Not Available" messages
in the thread index. This would slightly beautify thread indexes, and
many archive perusers don't actually care if some message is not
available.

(2) Mime handling is great with MHonarc. However many Mime objects get
named files with things like 000432.bin. I would prefer, from a
usuability standpoint, to have the files stored under their attachment
name, for a couple of reasons.

First, some OS's like Windows put great significance on file name
attachments. Imagine a Windows person browsing a set of archives.
Having the .doc extension on a Microsoft Word document
renamed .bin turns a one step click-and-view into a multistep
renaming process.

I guess my first preference would be actually keeping the attachment
names, so I guess putting attachments in a subdirectory per message
would be required to avoid name space collisions. Not doing that,
I'd much rather see a naming scheme like 000432.doc 000433.xls so
at least browser and server Mime typing will work correctly.








no more

1998-09-15 Thread Jeff Breidenbach


I always get the feeling that if I look hard enough in the Mhonarc
documentation, the answer to any question is sitting there. But I
couldn't find these:

1) I would like my index to be more confident about threading.
That means not using the disclaimer 
when things are unsure. Let mistakes be made! Is there any
way to turn off the disclaimer?

2) I've noticed there are many date resource variables for
use in the web pages, anything from 01/02/98 to the ISO
whateverwhatever official date string. But it would be nice
if there was a date string that said something like "Jan 4, 1998"
which has the advantage of being short, unambiguous to
those of us who can't remember whether month comes first or
day comes first in 02/02/02, oh, and year 2000 compliant!

You know you are working with a polished system when the questions
get this finely detailed! Sorry if the answers are already sitting in
the documentation - I didn't see them.

Jeff



Re: reproducible URLs

1998-09-10 Thread Jeff Breidenbach


>8 base-46 characters is sufficient to have a minuscule collision
>probability for archives of any reasonable size.

That's still only 44 bits of namespace. I guess it depends on
what you call reasonable risk; to me it feels a little high.

  Risk# of Messages (approximately)
-
1:10   18,000
1:3000 100,000   
3:100  1,000,000

>A 100,000 message archive seems two orders of magnitude too high for
>MHonArc's basic design; anything that large using a filesystem as its
>database needs to be organized hierarchically.  That would add a
>subdirectory namespace into the quota.

Two orders of magnitude? I am running two archives that will exceed
100,000 messages in the next two years, at the rate they are
growing. Their current size, 50,000 messages apiece, works fine under
the ext2 (linux native) filesystem. I think a statistical limit of one
million is better, as that better reflects the largest lists out there
stored over many years.

While many filesystems bog down with a large nuber of files in a
particular directory, not all do. Perfermance with lots of files in a
directory is not an inherent problem; it is directly tied to the
design of the filesystem. 

An arguement could be made that it doesn't make sense to compensate
for broken filesystems, whether due to some crazy 8.3 namespace
limitation, or due to braindead performance with lots of files. The
place for the fix would be in the filesystem and/or underlying OS, not
MHonArc. (Kind of like it didn't really make sense to convolute Java
applet code, just so the applet would work on a broken Netscape 2.01
browser.)

Jeff



reproducible URLs

1998-09-10 Thread Jeff Breidenbach


A while back there was a discussion of reprodicible URLs (to avoid
messing up search engines when mail gets re-archived) and issues
surrounding randomness, probabilities, MD-5, message-ID, and 8.3
filenames.

Anyway, sorry I didn't jump in then, but the kind-of-fun question was
implicitly raised: how many bits of randomness do you need for
reproducible URLs in MHonArc?  (Hey, it's not every day that real life
questions can be tackled like problem sets!)

We know that the more messages there are, the more likely
that there will be a duplicated filename.  So, lets assume file names
have x bits of randomness and there are n messages. The probability of
collision is

n
--n
\/
/   i   which is|   i di  2
--  approximately   /n
   i=0  0  =  --
 ---    x+1
   x x 2
 2  2

which is the total likelihood of collisions over the total
number of possibilities (often called the sample space.)

So we want a low chance of collision for any likely size of n.  If
n=10^6 (about 2^20) and we want a one-in-a-hundred-thousand odds of
collision for such and extreme case, then x comes out to about 56.
That's 56 bits of randomness.

Well, in an 8.3 filename, with no case sensitivity, and only using
numbers and letters, we get over 57 bits of randomness to play with,
using all 11 characters. No problem.

Now if we are restricted to ending the filenames with something like
.htm, then there are only about 41 bits of randomness, and then we
run about 1% risk of collision for a puny n=100,000 message archive.
That's pushing it.

Ok, one last note. If we use a real filesystem, with upper and lower
case letters in the filenames, we'd still need 10 characters in the
filename to meet/exceed the acceptable saftey margin (57 bits). So
those lower case letters don't help us much in the region we are
interested in.

Using MD-5 checksums for filenames is complete overkill statisticly
speaking. They are 128 bits, and would consume 20-odd characters in
the filename. 10 character filenames would do the trick nicely. There
is certainly no need to combine MD-5 and message-ID's from a
statistical standpoint.

Okay, sorry for the babbling! It's was the repressed student inside
me.

Jeff



reproducible URLs

1998-09-02 Thread Jeff Breidenbach


I had been wracking my brains, trying to think of what could
possibly be improved with the wonderful MHonArc program.

And then it hit me. When I use MhonArc, I tend to think of it as a
renderer - I feed it email and it renders a bunch of HTML files.
On occasion, I will change something in the rcfile (or whatnot)
and rerender the all the email messages.

Now sometimes that results in what was msg00587.html to turn into
msg00560.html. Not a big deal, except that it might leave a dozen
major internet search engines (and my little minor search engine) with
a bad idea of what is where, at least until the pages can be
re-indexed.

So, one potential feature for the future might be an option to use
reproducible filenames for messges. Like naming the file after the MD5
checksum of a message, or the message ID, or something else that is
statistically likely to be unique.

Anyway, it's just a thought I wanted to throw out. It would be yet
another stripe of icing on a wonderful cake.

Jeff



Re: removing address from archives

1998-08-31 Thread Jeff Breidenbach


>Can you provide more information on what you are trying to achieve?

I am trying to achieve web pages where header fields for a message areB
displayed normally (defaultly) except for the from: header field,
which is displayed without an email address. The goal is to continue
thwarting spambots yet still display some information about who sent
the mail.

Most specifically, I'd like to add a line "From: Earl Hood"
just before the Subject: header field in the archived message
http://www.mail-archive.com/mhonarc@ncsa.uiuc.edu/msg00569.html

Currently, I am excluding the From: field entirely using 
the following markup in the rcfile.


subect
date


Thanks for any insights,
Jeff



removing address from archives

1998-08-30 Thread Jeff Breidenbach


Excluding seems straightforward - EXCS or FIELDORDER will happily
exclude a field.

But... I don't know how to get the name in there;


From: $FROMNAME$


doesn't do the trick since FROMNAME is not available to FIELDSBEG

What am I missing?

.> 1. Remove the e-mail address from the archives.  So instead of showing:
.> From: "Matthew Andersen" <[EMAIL PROTECTED]>  I could have it show up as:
.> From: Matthew Andersen  Eliminating the actual address completely.  
.
.  You can use EXCS to exclude the field, and then use the $FROMNAME$
.  resource variable to specify author.