Hi Jeff,

First off, I made a silly mistake in monthme - producing links to
html in subdirectories of course doesn't work, as none of the
relative links work.  This is quickly solved by linking to the
directory instead which makes a lot more sense anyhow.  The patch
I've attached (see below) fixes this.

> I applied your patch to the monthly index page generator -- and it
> looks like quite an improvement. However, relying on the 'Date:'
> header is definitely hurting, and something is inconsistant.  This may
> take some work, possibly including perhaps bounce.pl modifications.
> See:
> 
> http://www.mail-archive.com/sinister@majordomo.net/1904-month-06/maillist.html
> http://www.mail-archive.com/sinister@majordomo.net/1904-month-06/msg00000.html

I just saw these and nearly died - actually the code's doing what it
should: these dates were actually *in* the mails - misconfigured clients.
I'd skimmed through the discussion in gossip on "received date" and
stupidly assumed I hadn't a problem with my list.  Those who don't learn
from history are doomed to repeat it.  So I don't think it's a code
problem, and goes back instead to the question of how to manage imports.
Maybe bounce.pl could do a sanity check on on "Date:" vs. "Received:"
within some bounds (Michael?) but that's all I can suggest.  Right now I
can't see anywhere the mailme/monthme code is flakey in it's date handling,
although I could be missing something.

Jeff - sorry for these and I think it's best to let my list accumulate now
- I'll try and deal with it later and maybe ask you just to zap those, if I
can make it easy for you to do: I'll send a simple script or something if
that's ok.

> The patches to digger/rcfile are good, but I am concerned about htdig
> trying to index attachments, which I don't want it to do.  I'm afraid
> of htdig getting bogged down with some weird mime type.  Thus, I
> haven't applied the patches until this issue gets
> resolved. Suggestions?

Yes, htDig going somewhere it shouldn't might be a problem, I can see
that.  But Jeff, hasn't it been indexing attachments anyway already?
limit_urls_to was set to foo/msg which is where the attachments are stored
too, isn't it?

But looking back I think you're right that my solution (removing "msg"!)
isn't very robust.  In case anyone's following this, the problem is how to
ask htdig to index the following directories under a root starting point of
maillist.html:

msg*.html
????-month-??/msg*.html

You can add multiple "patterns" to limit_urls_to, but you can't use
wildcards ("patterns"??) and multiple strings are or'd, not and'd.  And you
don't know how many ????-month-??'s there are.  And if you just set it to
"msg", you run the risk of a list with "msg" in its name.  You also can't
do anything creative with a combination of "limit_urls_to" and
"limit_normalized" (I tried).  Oh and you can't use "exclude_urls" because
you don't know index file names for customised lists and whatever else
might end up in there.

I'm sending you two more diff's with my proposed solution - again diff's
from your original source.  I propose monthme writes out an list of files
for htdig to index, and htdig in digger uses its `...` option to include
them, along with your original pattern so monthlies don't break.  My
version of htdig doesn't seem to mind that the file doesn't exist for
non-monthly lists.

> Finally, I won't be able to work on these things myself as my current
> priority is installing additional disk space.

That's ok as long as you don't mind me throwing these patches at you
from time to time.  I think we're nearly there, honestly.

Paul
--- ../bin/monthme.jeff Sat Sep  4 17:44:09 1999
+++ monthme     Sun Sep  5 00:30:03 1999
@@ -29,6 +29,8 @@
 MAILLIST=$1
 NICKNAME=$2
 
+TARGET=http://localhost
+
 #########################
 ###   Action          ###
 #########################
@@ -41,9 +43,18 @@
 CTRAIL=$CONFDIR/trailer-monthly.html
 [ -f $CHEAD ] && cat $CHEAD >> $MONTHINDEX
 
+MAILLIST=$(echo $MAILLIST | awk '{ print tolower($1) }')
+ESCAPED_NAME=$(echo $MAILLIST | tr '@.' '__')
+
 # Start off the page
 cat >> $MONTHINDEX <<EOF
-Monthly index for <strong>$NICKNAME</strong> mailing list:
+<h2>Monthly index for the $NICKNAME mailing list</h2>
+<a name="months">
+<a href="latest/maillist.html">Latest Messages by Date</a>
+<br>
+<a href="latest/index.html">Latest Messages by Thread</a>
+<p>
+<strong>By month:</strong>
 <ul>
 EOF
 
@@ -59,13 +70,43 @@
 # End month list and start search section
 cat >> $MONTHINDEX <<EOF
 </ul>
+<h2>Search $NICKNAME</h2>
+<a name="search">
 <FORM ACTION="/cgi-bin/htsearch" METHOD=GET>
 <INPUT TYPE="text"     NAME="words"   VALUE="" size=35>
 <INPUT TYPE="submit"   VALUE="Search">
-<BR>
-<INPUT TYPE="checkbox" NAME="restricttofiles" VALUE="on">
-Restrict matched files
-<SELECT NAME="filelist" SIZE=3 MULTIPLE>
+<p>
+<strong>Search options:</strong>
+<p>
+<table><tr><td>
+Match: <select name=method>
+<option value=and selected>All
+<option value=or>Any
+<option value=boolean>Boolean
+</select>
+</td><td>
+Format: <select name=format>
+<option value="builtin-long">Long
+<option value="builtin-short">Short
+</select>
+</td><td>
+Sort by: <select name=sort>
+<option value="score">Score
+<option value="date">Date
+<option value="title">Name
+</select>
+</td></tr>
+<tr><td>
+Results per page: <select name=matchesperpage>
+<option value=10 selected>10
+<option value=20>20
+<option value=50>50
+<option value=100>100
+</select>
+</td><td>
+Restrict search to months:
+</td><td>
+<select name=restrict size=3 multiple>
 EOF
 
 # Optional button for searching within each month
@@ -75,11 +116,9 @@
 
 # Finish off search section
 cat >> $MONTHINDEX <<EOF
-<BR>
 </SELECT>
-<INPUT TYPE="hidden"   NAME="restrict" VALUE="filelist">
-<INPUT TYPE="hidden"   NAME="method"  VALUE="and">
-<INPUT TYPE="hidden"   NAME="format"  VALUE="short">
+</td></tr>
+</table>
 <INPUT TYPE="hidden"   NAME="config"  VALUE="$ESCAPED_NAME">
 <INPUT TYPE="hidden"   NAME="exclude" VALUE="">
 </FORM>
@@ -90,4 +129,16 @@
 ln -s $HOME/archive/$MAILLIST/maillist.html \
 $HOME/archive/$MAILLIST/index.html
 
-
+# Compile list of subdirs for htdig to index
+HTDIGINC=$HOME/vault/$ESCAPED_NAME/htdiginc
+cat > $HTDIGINC < /dev/null
+for MONTH in $MONTHDIRS ; do
+  echo "$TARGET/$MAILLIST/$MONTH/msg" >> $HTDIGINC
+  echo "$TARGET/$MAILLIST/$MONTH/maillist.html" >> $HTDIGINC
+done
+  
+# Create link to latest indexes
+# Note: not "this month" - might be no messages yet.
+LATESTM=`echo "$MONTHDIRS" | sort -nr | head -1`
+rm -f $HOME/archive/$MAILLIST/latest
+ln -s $LATESTM $HOME/archive/$MAILLIST/latest
--- ../bin/digger.jeff  Sun Aug 29 19:26:14 1999
+++ digger      Sun Sep  5 00:27:05 1999
@@ -87,7 +87,7 @@
     echo "nothing_found_file:     $CONF/nomatch.html"        >> $CFG
     echo "search_results_wrapper: $CONF/wrapper.html"        >> $CFG
 
-    echo "limit_urls_to:        $TARGET/$MAILLIST/msg"       >> $CFG
+    echo "limit_urls_to:        \`$VAULT/$ESCAPED_NAME/htdiginc\` 
+$TARGET/$MAILLIST/msg"           >> $CFG
     echo "exclude_urls:         .dir"                        >> $CFG
     echo "max_head_length:      10000"                       >> $CFG
     echo "remove_bad_urls:      true"                        >> $CFG

Reply via email to