Hi Jeff, First off, I made a silly mistake in monthme - producing links to html in subdirectories of course doesn't work, as none of the relative links work. This is quickly solved by linking to the directory instead which makes a lot more sense anyhow. The patch I've attached (see below) fixes this.
> I applied your patch to the monthly index page generator -- and it > looks like quite an improvement. However, relying on the 'Date:' > header is definitely hurting, and something is inconsistant. This may > take some work, possibly including perhaps bounce.pl modifications. > See: > > http://www.mail-archive.com/sinister@majordomo.net/1904-month-06/maillist.html > http://www.mail-archive.com/sinister@majordomo.net/1904-month-06/msg00000.html I just saw these and nearly died - actually the code's doing what it should: these dates were actually *in* the mails - misconfigured clients. I'd skimmed through the discussion in gossip on "received date" and stupidly assumed I hadn't a problem with my list. Those who don't learn from history are doomed to repeat it. So I don't think it's a code problem, and goes back instead to the question of how to manage imports. Maybe bounce.pl could do a sanity check on on "Date:" vs. "Received:" within some bounds (Michael?) but that's all I can suggest. Right now I can't see anywhere the mailme/monthme code is flakey in it's date handling, although I could be missing something. Jeff - sorry for these and I think it's best to let my list accumulate now - I'll try and deal with it later and maybe ask you just to zap those, if I can make it easy for you to do: I'll send a simple script or something if that's ok. > The patches to digger/rcfile are good, but I am concerned about htdig > trying to index attachments, which I don't want it to do. I'm afraid > of htdig getting bogged down with some weird mime type. Thus, I > haven't applied the patches until this issue gets > resolved. Suggestions? Yes, htDig going somewhere it shouldn't might be a problem, I can see that. But Jeff, hasn't it been indexing attachments anyway already? limit_urls_to was set to foo/msg which is where the attachments are stored too, isn't it? But looking back I think you're right that my solution (removing "msg"!) isn't very robust. In case anyone's following this, the problem is how to ask htdig to index the following directories under a root starting point of maillist.html: msg*.html ????-month-??/msg*.html You can add multiple "patterns" to limit_urls_to, but you can't use wildcards ("patterns"??) and multiple strings are or'd, not and'd. And you don't know how many ????-month-??'s there are. And if you just set it to "msg", you run the risk of a list with "msg" in its name. You also can't do anything creative with a combination of "limit_urls_to" and "limit_normalized" (I tried). Oh and you can't use "exclude_urls" because you don't know index file names for customised lists and whatever else might end up in there. I'm sending you two more diff's with my proposed solution - again diff's from your original source. I propose monthme writes out an list of files for htdig to index, and htdig in digger uses its `...` option to include them, along with your original pattern so monthlies don't break. My version of htdig doesn't seem to mind that the file doesn't exist for non-monthly lists. > Finally, I won't be able to work on these things myself as my current > priority is installing additional disk space. That's ok as long as you don't mind me throwing these patches at you from time to time. I think we're nearly there, honestly. Paul
--- ../bin/monthme.jeff Sat Sep 4 17:44:09 1999 +++ monthme Sun Sep 5 00:30:03 1999 @@ -29,6 +29,8 @@ MAILLIST=$1 NICKNAME=$2 +TARGET=http://localhost + ######################### ### Action ### ######################### @@ -41,9 +43,18 @@ CTRAIL=$CONFDIR/trailer-monthly.html [ -f $CHEAD ] && cat $CHEAD >> $MONTHINDEX +MAILLIST=$(echo $MAILLIST | awk '{ print tolower($1) }') +ESCAPED_NAME=$(echo $MAILLIST | tr '@.' '__') + # Start off the page cat >> $MONTHINDEX <<EOF -Monthly index for <strong>$NICKNAME</strong> mailing list: +<h2>Monthly index for the $NICKNAME mailing list</h2> +<a name="months"> +<a href="latest/maillist.html">Latest Messages by Date</a> +<br> +<a href="latest/index.html">Latest Messages by Thread</a> +<p> +<strong>By month:</strong> <ul> EOF @@ -59,13 +70,43 @@ # End month list and start search section cat >> $MONTHINDEX <<EOF </ul> +<h2>Search $NICKNAME</h2> +<a name="search"> <FORM ACTION="/cgi-bin/htsearch" METHOD=GET> <INPUT TYPE="text" NAME="words" VALUE="" size=35> <INPUT TYPE="submit" VALUE="Search"> -<BR> -<INPUT TYPE="checkbox" NAME="restricttofiles" VALUE="on"> -Restrict matched files -<SELECT NAME="filelist" SIZE=3 MULTIPLE> +<p> +<strong>Search options:</strong> +<p> +<table><tr><td> +Match: <select name=method> +<option value=and selected>All +<option value=or>Any +<option value=boolean>Boolean +</select> +</td><td> +Format: <select name=format> +<option value="builtin-long">Long +<option value="builtin-short">Short +</select> +</td><td> +Sort by: <select name=sort> +<option value="score">Score +<option value="date">Date +<option value="title">Name +</select> +</td></tr> +<tr><td> +Results per page: <select name=matchesperpage> +<option value=10 selected>10 +<option value=20>20 +<option value=50>50 +<option value=100>100 +</select> +</td><td> +Restrict search to months: +</td><td> +<select name=restrict size=3 multiple> EOF # Optional button for searching within each month @@ -75,11 +116,9 @@ # Finish off search section cat >> $MONTHINDEX <<EOF -<BR> </SELECT> -<INPUT TYPE="hidden" NAME="restrict" VALUE="filelist"> -<INPUT TYPE="hidden" NAME="method" VALUE="and"> -<INPUT TYPE="hidden" NAME="format" VALUE="short"> +</td></tr> +</table> <INPUT TYPE="hidden" NAME="config" VALUE="$ESCAPED_NAME"> <INPUT TYPE="hidden" NAME="exclude" VALUE=""> </FORM> @@ -90,4 +129,16 @@ ln -s $HOME/archive/$MAILLIST/maillist.html \ $HOME/archive/$MAILLIST/index.html - +# Compile list of subdirs for htdig to index +HTDIGINC=$HOME/vault/$ESCAPED_NAME/htdiginc +cat > $HTDIGINC < /dev/null +for MONTH in $MONTHDIRS ; do + echo "$TARGET/$MAILLIST/$MONTH/msg" >> $HTDIGINC + echo "$TARGET/$MAILLIST/$MONTH/maillist.html" >> $HTDIGINC +done + +# Create link to latest indexes +# Note: not "this month" - might be no messages yet. +LATESTM=`echo "$MONTHDIRS" | sort -nr | head -1` +rm -f $HOME/archive/$MAILLIST/latest +ln -s $LATESTM $HOME/archive/$MAILLIST/latest
--- ../bin/digger.jeff Sun Aug 29 19:26:14 1999 +++ digger Sun Sep 5 00:27:05 1999 @@ -87,7 +87,7 @@ echo "nothing_found_file: $CONF/nomatch.html" >> $CFG echo "search_results_wrapper: $CONF/wrapper.html" >> $CFG - echo "limit_urls_to: $TARGET/$MAILLIST/msg" >> $CFG + echo "limit_urls_to: \`$VAULT/$ESCAPED_NAME/htdiginc\` +$TARGET/$MAILLIST/msg" >> $CFG echo "exclude_urls: .dir" >> $CFG echo "max_head_length: 10000" >> $CFG echo "remove_bad_urls: true" >> $CFG