Re: Archiving by month

1999-09-06 Thread PS Mitchell

Hi Jeff,

I'm sorry you had problems with my previous date code - hope it hasn't
caused you much grief.  I just didn't realise there were mail clients out
there which produced malformed date strings (well, ones that date(1) can't
read at least).

The last patch to mailme I sent was (again) incorrect as it didn't
take care of multiple Received: lines, and the junk on a Received line
before the semicolon.  I've attached a patch (again from your original
source) which deals with this (it uses the top Received: line which I think
is safest), and also skips the date code completely unless it's a monthly
list, just so that any problems don't affect non-monthly lists at all.  But
providing x-archive-with-list is formatted ok, or Received:  lines are
sensible (which come from your server so they are), malformed dates
hopefully should be a thing of the past.

With regard to attachments: I copied over your own MIMEargs (just added a
target=), and I've gone back and regenerated my archives here with your
rcfile alone: and yes, I think it's with text/plain attachments that the
msg* subdirectory gets created.  I don't know why they don't tack
.dir on the end for these attachments, at least here, but I'm sure
it's not a worry.

Paul


--- mailme.jeff Mon Sep  6 21:17:09 1999
+++ mailme  Mon Sep  6 21:01:19 1999
@@ -103,6 +103,25 @@
 } '
 }
 
+# Special case of grab() for Received: lines
+# This is because we want to take first occurrence
+# Example usage: cat messageheaders | receiveme
+receiveme() {
+$NAWK '
+BEGIN {
+m = ^Received:
+}
+{
+if (match($0, m)) {
+print $0
+getline
+while ( $0 ~ /^[ \t]+/) {
+print $0; getline
+}
+}
+} '
+}
+
 # Get all email addresses, and precede them with a carat.
 # Example usage: cat RMAIL | waterfall
 waterfall () {
@@ -318,10 +337,6 @@
 X13=`echo $T | grab mailing-list` #use later
 X14=`echo $T | grab list-post`
 
-# If indexing by month, we care about the date
-DATE=`echo $T | grab date`
-JUSTDATE=`echo $DATE | sed 's/^date: //i'`
-
 # Extract email addresses
 CHANCE=$(echo $TO $CC $X1 $X2 $X3 $X4 $X5 $X6 $X7 $X8 $X9 $X10 $X11 $X14 |\
 waterfall)
@@ -453,6 +468,29 @@
 MONFLAG=$HOME/vault/$ESCAPED_NAME/monthly
 if [ -f $MONFLAG ]
 then
+
+# If indexing by month, we care about the date
+# If importing, see if x-archive-with-date is set first.
+# Use Date: as last resort because of mis-set clocks.
+   XDATE=`echo $T | grab x-archive-with-date`
+   [ ! $XDATE ]  XDATE=`echo $T | receiveme`
+   [ ! $XDATE ]  XDATE=`echo $T | grab date`
+   if [ ! $XDATE ]
+   then
+   emergency_divert NODATE Unable to find any date field.
+   exit -1
+   fi
+   JUSTDATE=`echo $XDATE | sed -e 's/^date: //i' \
+  -e 's/^x-archive-with-date: //i' \
+  -e 's/^received:.*; //i'`
+   date -d $JUSTDATE  /dev/null 21
+   ex=$?
+   if [ $ex != 0 ]
+   then
+   emergency_divert NODATE Unable to find valid date field.
+   exit -1
+   fi
+
 echok info Now switching to monthly indexing
 MM=`date -d $JUSTDATE +%Y-month-%m`
 MONBEG=`date -d $JUSTDATE +%m/01/%Y:00:00:00.00`



Re: Archiving by month

1999-09-06 Thread Earl Hood

On September 5, 1999 at 13:49, Jeff Breidenbach wrote:

 Paul, see how attachments end up in subdirectories, for example
 http:[EMAIL PROTECTED]/msg00459.html
 The default rcfile puts attachments in a subdirectory, with a .dir
 extension. You are probably overriding the MIMEArgs directive, or
 perhaps .html attachments are treated differently.

I think the problem is a change in mhexternal.pl of MHonArc in
the naming of attachment subdirectories.  It appears I did not
mention it in CHANGES, but here is the SCCS delta comment on it:

D 2.7 99/06/25 13:59:18+05:00 [EMAIL PROTECTED] 22 21  3/3/228
P /home/ehood/work/perl/MHonArc/lib/mhexternal.pl
C Removed addition of .dir to subdir.

According to the date, it was applicable for v2.4.0 or v2.4.1.  I
cannot remember the exact reason for the change, but some user had
problems with the .dir so I figured no harm (ha ha) would occur
if I removed the .dir.

I do not know how htdig works, but can it index specified list of
file types (eg: .html, .txt), or can you specify a regex/glob mask
(or match) to control indexing?

--ewh



Re: Archiving by month

1999-09-06 Thread Jeff Breidenbach


Hi htdig folks,

I'm having a bit of a problem getting what I want from the htdig
configuration options. Lots of people, myself included, use htdig in
conjunction with MHonArc. In the current release version of MHonArc
(2.4.3, which I recently upgraded to) attachments may be stored in
subdirectories as following:

The first URL is the message, while the second is the attachment.
No need to follow the links, just look at their structure.

http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174.html
http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174/The_state_i_am_in.txt

My question is, using the current stable version of htdig, how
can I configure it to ONLY index messages, and not index attachments?
If I could say Ignore everything that does not end in .html or
only index URLs with a certain regexp that would do the trick. 
But with the current configuration options, I just don't see how to do
this. 

Thanks in advance for enlightenment.

Jeff