Re: Archiving by month

1999-09-06 Thread PS Mitchell

Hi Jeff,

I'm sorry you had problems with my previous date code - hope it hasn't
caused you much grief.  I just didn't realise there were mail clients out
there which produced malformed date strings (well, ones that date(1) can't
read at least).

The last patch to mailme I sent was (again) incorrect as it didn't
take care of multiple Received: lines, and the junk on a Received line
before the semicolon.  I've attached a patch (again from your original
source) which deals with this (it uses the top Received: line which I think
is safest), and also skips the date code completely unless it's a monthly
list, just so that any problems don't affect non-monthly lists at all.  But
providing x-archive-with-list is formatted ok, or Received:  lines are
sensible (which come from your server so they are), malformed dates
hopefully should be a thing of the past.

With regard to attachments: I copied over your own MIMEargs (just added a
target=), and I've gone back and regenerated my archives here with your
rcfile alone: and yes, I think it's with text/plain attachments that the
msg* subdirectory gets created.  I don't know why they don't tack
.dir on the end for these attachments, at least here, but I'm sure
it's not a worry.

Paul


--- mailme.jeff Mon Sep  6 21:17:09 1999
+++ mailme  Mon Sep  6 21:01:19 1999
@@ -103,6 +103,25 @@
 } '
 }
 
+# Special case of grab() for Received: lines
+# This is because we want to take first occurrence
+# Example usage: cat messageheaders | receiveme
+receiveme() {
+$NAWK '
+BEGIN {
+m = ^Received:
+}
+{
+if (match($0, m)) {
+print $0
+getline
+while ( $0 ~ /^[ \t]+/) {
+print $0; getline
+}
+}
+} '
+}
+
 # Get all email addresses, and precede them with a carat.
 # Example usage: cat RMAIL | waterfall
 waterfall () {
@@ -318,10 +337,6 @@
 X13=`echo $T | grab mailing-list` #use later
 X14=`echo $T | grab list-post`
 
-# If indexing by month, we care about the date
-DATE=`echo $T | grab date`
-JUSTDATE=`echo $DATE | sed 's/^date: //i'`
-
 # Extract email addresses
 CHANCE=$(echo $TO $CC $X1 $X2 $X3 $X4 $X5 $X6 $X7 $X8 $X9 $X10 $X11 $X14 |\
 waterfall)
@@ -453,6 +468,29 @@
 MONFLAG=$HOME/vault/$ESCAPED_NAME/monthly
 if [ -f $MONFLAG ]
 then
+
+# If indexing by month, we care about the date
+# If importing, see if x-archive-with-date is set first.
+# Use Date: as last resort because of mis-set clocks.
+   XDATE=`echo $T | grab x-archive-with-date`
+   [ ! $XDATE ]  XDATE=`echo $T | receiveme`
+   [ ! $XDATE ]  XDATE=`echo $T | grab date`
+   if [ ! $XDATE ]
+   then
+   emergency_divert NODATE Unable to find any date field.
+   exit -1
+   fi
+   JUSTDATE=`echo $XDATE | sed -e 's/^date: //i' \
+  -e 's/^x-archive-with-date: //i' \
+  -e 's/^received:.*; //i'`
+   date -d $JUSTDATE  /dev/null 21
+   ex=$?
+   if [ $ex != 0 ]
+   then
+   emergency_divert NODATE Unable to find valid date field.
+   exit -1
+   fi
+
 echok info Now switching to monthly indexing
 MM=`date -d $JUSTDATE +%Y-month-%m`
 MONBEG=`date -d $JUSTDATE +%m/01/%Y:00:00:00.00`



Re: Archiving by month

1999-09-06 Thread Earl Hood

On September 5, 1999 at 13:49, Jeff Breidenbach wrote:

 Paul, see how attachments end up in subdirectories, for example
 http:[EMAIL PROTECTED]/msg00459.html
 The default rcfile puts attachments in a subdirectory, with a .dir
 extension. You are probably overriding the MIMEArgs directive, or
 perhaps .html attachments are treated differently.

I think the problem is a change in mhexternal.pl of MHonArc in
the naming of attachment subdirectories.  It appears I did not
mention it in CHANGES, but here is the SCCS delta comment on it:

D 2.7 99/06/25 13:59:18+05:00 [EMAIL PROTECTED] 22 21  3/3/228
P /home/ehood/work/perl/MHonArc/lib/mhexternal.pl
C Removed addition of .dir to subdir.

According to the date, it was applicable for v2.4.0 or v2.4.1.  I
cannot remember the exact reason for the change, but some user had
problems with the .dir so I figured no harm (ha ha) would occur
if I removed the .dir.

I do not know how htdig works, but can it index specified list of
file types (eg: .html, .txt), or can you specify a regex/glob mask
(or match) to control indexing?

--ewh



Re: Archiving by month

1999-09-06 Thread Jeff Breidenbach


Hi htdig folks,

I'm having a bit of a problem getting what I want from the htdig
configuration options. Lots of people, myself included, use htdig in
conjunction with MHonArc. In the current release version of MHonArc
(2.4.3, which I recently upgraded to) attachments may be stored in
subdirectories as following:

The first URL is the message, while the second is the attachment.
No need to follow the links, just look at their structure.

http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174.html
http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174/The_state_i_am_in.txt

My question is, using the current stable version of htdig, how
can I configure it to ONLY index messages, and not index attachments?
If I could say Ignore everything that does not end in .html or
only index URLs with a certain regexp that would do the trick. 
But with the current configuration options, I just don't see how to do
this. 

Thanks in advance for enlightenment.

Jeff




Re: Archiving by month

1999-09-05 Thread PS Mitchell

Jeff Breidenbach said:

 Solution sets that I see are:
 
   b1) ask mailme to do monthly sorts off of x-archive-with-date
   headers primarily, received headers secondarily
   b2) modify bounce.pl to generate x-archive-with-date based on
   received headers.

Attached is b1), then.  I've added some sanity checking for the date
field so it should be pretty robust.  With this implemented, it means
that mailme expects bounce.pl (or equivalent) to set the
x-archive-with-date field intelligently, and doesn't do any second
guessing, which is how I think it should be.  If it doesn't find
x-archive-with-date, it looks for Received:, and if that's not set it
falls over to Date:, which is the way Jeff's rcfile deals with dates
too, so is consistent.  So now mailme and rcfile agree that, if
you're importing, you HAVE to set an intelligent x-archive-with-date
field; else, they both use Received:, which goes by the clock on
Jeff's server.

I won't hack bounce.pl for fear of doing wrong, but yes I'm pretty
sure I agree now that it should set x-archive-with-date from the
original Recieved: fields if possible.

Paul


--- mailme.jeff Sun Sep  5 14:08:19 1999
+++ mailme  Sun Sep  5 15:25:07 1999
@@ -319,8 +319,26 @@
 X14=`echo $T | grab list-post`
 
 # If indexing by month, we care about the date
-DATE=`echo $T | grab date`
-JUSTDATE=`echo $DATE | sed 's/^date: //i'`
+# If importing, see if x-archive-with-date is set first.
+# Use Date: as last resort because of mis-set clocks.
+XDATE=`echo $T | grab x-archive-with-date`
+[ ! $XDATE ]  XDATE=`echo $T | grab received`
+[ ! $XDATE ]  XDATE=`echo $T | grab date`
+if [ ! $XDATE ]
+then
+   emergency_divert NODATE Unable to find any date field.
+   exit -1
+fi
+JUSTDATE=`echo $XDATE | sed -e 's/^date: //i' \
+  -e 's/^x-archive-with-date: //i' \
+  -e 's/^received: //i'`
+date -d $JUSTDATE  /dev/null 21
+ex=$?
+if [ $ex != 0 ]
+then
+   emergency_divert NODATE Unable to find valid date field.
+   exit -1
+fi
 
 # Extract email addresses
 CHANCE=$(echo $TO $CC $X1 $X2 $X3 $X4 $X5 $X6 $X7 $X8 $X9 $X10 $X11 $X14 |\