Re: Archiving by month
Hi Jeff, I'm sorry you had problems with my previous date code - hope it hasn't caused you much grief. I just didn't realise there were mail clients out there which produced malformed date strings (well, ones that date(1) can't read at least). The last patch to mailme I sent was (again) incorrect as it didn't take care of multiple Received: lines, and the junk on a Received line before the semicolon. I've attached a patch (again from your original source) which deals with this (it uses the top Received: line which I think is safest), and also skips the date code completely unless it's a monthly list, just so that any problems don't affect non-monthly lists at all. But providing x-archive-with-list is formatted ok, or Received: lines are sensible (which come from your server so they are), malformed dates hopefully should be a thing of the past. With regard to attachments: I copied over your own MIMEargs (just added a target=), and I've gone back and regenerated my archives here with your rcfile alone: and yes, I think it's with text/plain attachments that the msg* subdirectory gets created. I don't know why they don't tack .dir on the end for these attachments, at least here, but I'm sure it's not a worry. Paul --- mailme.jeff Mon Sep 6 21:17:09 1999 +++ mailme Mon Sep 6 21:01:19 1999 @@ -103,6 +103,25 @@ } ' } +# Special case of grab() for Received: lines +# This is because we want to take first occurrence +# Example usage: cat messageheaders | receiveme +receiveme() { +$NAWK ' +BEGIN { +m = ^Received: +} +{ +if (match($0, m)) { +print $0 +getline +while ( $0 ~ /^[ \t]+/) { +print $0; getline +} +} +} ' +} + # Get all email addresses, and precede them with a carat. # Example usage: cat RMAIL | waterfall waterfall () { @@ -318,10 +337,6 @@ X13=`echo $T | grab mailing-list` #use later X14=`echo $T | grab list-post` -# If indexing by month, we care about the date -DATE=`echo $T | grab date` -JUSTDATE=`echo $DATE | sed 's/^date: //i'` - # Extract email addresses CHANCE=$(echo $TO $CC $X1 $X2 $X3 $X4 $X5 $X6 $X7 $X8 $X9 $X10 $X11 $X14 |\ waterfall) @@ -453,6 +468,29 @@ MONFLAG=$HOME/vault/$ESCAPED_NAME/monthly if [ -f $MONFLAG ] then + +# If indexing by month, we care about the date +# If importing, see if x-archive-with-date is set first. +# Use Date: as last resort because of mis-set clocks. + XDATE=`echo $T | grab x-archive-with-date` + [ ! $XDATE ] XDATE=`echo $T | receiveme` + [ ! $XDATE ] XDATE=`echo $T | grab date` + if [ ! $XDATE ] + then + emergency_divert NODATE Unable to find any date field. + exit -1 + fi + JUSTDATE=`echo $XDATE | sed -e 's/^date: //i' \ + -e 's/^x-archive-with-date: //i' \ + -e 's/^received:.*; //i'` + date -d $JUSTDATE /dev/null 21 + ex=$? + if [ $ex != 0 ] + then + emergency_divert NODATE Unable to find valid date field. + exit -1 + fi + echok info Now switching to monthly indexing MM=`date -d $JUSTDATE +%Y-month-%m` MONBEG=`date -d $JUSTDATE +%m/01/%Y:00:00:00.00`
Re: Archiving by month
On September 5, 1999 at 13:49, Jeff Breidenbach wrote: Paul, see how attachments end up in subdirectories, for example http:[EMAIL PROTECTED]/msg00459.html The default rcfile puts attachments in a subdirectory, with a .dir extension. You are probably overriding the MIMEArgs directive, or perhaps .html attachments are treated differently. I think the problem is a change in mhexternal.pl of MHonArc in the naming of attachment subdirectories. It appears I did not mention it in CHANGES, but here is the SCCS delta comment on it: D 2.7 99/06/25 13:59:18+05:00 [EMAIL PROTECTED] 22 21 3/3/228 P /home/ehood/work/perl/MHonArc/lib/mhexternal.pl C Removed addition of .dir to subdir. According to the date, it was applicable for v2.4.0 or v2.4.1. I cannot remember the exact reason for the change, but some user had problems with the .dir so I figured no harm (ha ha) would occur if I removed the .dir. I do not know how htdig works, but can it index specified list of file types (eg: .html, .txt), or can you specify a regex/glob mask (or match) to control indexing? --ewh
Re: Archiving by month
Hi htdig folks, I'm having a bit of a problem getting what I want from the htdig configuration options. Lots of people, myself included, use htdig in conjunction with MHonArc. In the current release version of MHonArc (2.4.3, which I recently upgraded to) attachments may be stored in subdirectories as following: The first URL is the message, while the second is the attachment. No need to follow the links, just look at their structure. http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174.html http://mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174/The_state_i_am_in.txt My question is, using the current stable version of htdig, how can I configure it to ONLY index messages, and not index attachments? If I could say Ignore everything that does not end in .html or only index URLs with a certain regexp that would do the trick. But with the current configuration options, I just don't see how to do this. Thanks in advance for enlightenment. Jeff
Re: Archiving by month
Jeff Breidenbach said: Solution sets that I see are: b1) ask mailme to do monthly sorts off of x-archive-with-date headers primarily, received headers secondarily b2) modify bounce.pl to generate x-archive-with-date based on received headers. Attached is b1), then. I've added some sanity checking for the date field so it should be pretty robust. With this implemented, it means that mailme expects bounce.pl (or equivalent) to set the x-archive-with-date field intelligently, and doesn't do any second guessing, which is how I think it should be. If it doesn't find x-archive-with-date, it looks for Received:, and if that's not set it falls over to Date:, which is the way Jeff's rcfile deals with dates too, so is consistent. So now mailme and rcfile agree that, if you're importing, you HAVE to set an intelligent x-archive-with-date field; else, they both use Received:, which goes by the clock on Jeff's server. I won't hack bounce.pl for fear of doing wrong, but yes I'm pretty sure I agree now that it should set x-archive-with-date from the original Recieved: fields if possible. Paul --- mailme.jeff Sun Sep 5 14:08:19 1999 +++ mailme Sun Sep 5 15:25:07 1999 @@ -319,8 +319,26 @@ X14=`echo $T | grab list-post` # If indexing by month, we care about the date -DATE=`echo $T | grab date` -JUSTDATE=`echo $DATE | sed 's/^date: //i'` +# If importing, see if x-archive-with-date is set first. +# Use Date: as last resort because of mis-set clocks. +XDATE=`echo $T | grab x-archive-with-date` +[ ! $XDATE ] XDATE=`echo $T | grab received` +[ ! $XDATE ] XDATE=`echo $T | grab date` +if [ ! $XDATE ] +then + emergency_divert NODATE Unable to find any date field. + exit -1 +fi +JUSTDATE=`echo $XDATE | sed -e 's/^date: //i' \ + -e 's/^x-archive-with-date: //i' \ + -e 's/^received: //i'` +date -d $JUSTDATE /dev/null 21 +ex=$? +if [ $ex != 0 ] +then + emergency_divert NODATE Unable to find valid date field. + exit -1 +fi # Extract email addresses CHANCE=$(echo $TO $CC $X1 $X2 $X3 $X4 $X5 $X6 $X7 $X8 $X9 $X10 $X11 $X14 |\