Jeff said:

> I took a closer look. Attachments are never indexed because of
> 
>     echo "exclude_urls:         .dir"                        >> $CFG

I'm confused here: attachments don't appear in anything called *.dir,
but in a subdirectory called msg????? for the relevant message - see

http://www.mail-archive.com/sinister%40majordomo.net/1997-month-08/msg00174.html

Certainly on my home setup they get indexed.  If I run:

htdig -a -c /etc/htdig/sinister_majordomo_net.conf -vvv

it shows them being indexed here.  And yes, we could restrict it to just
html files, but again, because limit_urls_to is so hopeless, it would
mean throwing away the directory check I think, because all you could
put was ".html".

> So I now feel safe doing your original
> 
> -    echo "limit_urls_to:        $TARGET/$MAILLIST/"          >> $CFG
> +    echo "limit_urls_to:        $TARGET/$MAILLIST/msg"       >> $CFG

- but the other way round, surely?  My change no. 1 yesterday was just to
take the "msg" away - a bit of a dirty kludge, but it preserved path info,
thus my patch no. 2 which just completely reproduced your limits (including
"msg") but for all named subdirectories too.  So patch #2 I sent more
exactly traces what you did for non-monthlies, with the cost of a little
more complexity.  I'm sure either would do, as long at
"$TARGET/$MAILLIST/msg" itself isn't hardcoded in, as that will break
monthlies because it misses all the monthly subdirectories.

> By the way, index pages are already protected
> from htdig in exactly the correct fashion (don't index the page,
> but do follow the links) by the META tag at the top of the page.

Ah yes, I missed that - you'll maybe want to add that to the
monthme-generated page too then?

Re. dates:

> Solution sets that I see are:
> 
>   a1) ask mailme to do monthly sorts off of received headers
>   a2) find another way to do imports
> 
>   b1) ask mailme to do monthly sorts off of x-archive-with-date
>       headers primarily, received headers secondarily
>   b2) modify bounce.pl to generate x-archive-with-date based on
>       received headers.
> 
> What I don't like is the extra complexity that all this will entail,
> and also the fact that it would require duplicating some of the
> functionality already found in MHonArc. Again, I'm going to ignore
> this all for a while and worry about installing bigger disks.

All I'd say is I don't think it's mailme's job to sort this out: the
problem lies in the data source, and mailme can't really and shouldn't
second-guess what the data *really* meant.  Previous discussions on gossip
have tended strongly to sorting order as:

x-archive-with-date:received:date

which archives new data "correctly" (relies on your local server
clock) but allows imported data to be dealt with specially if the
list manager's prepared to make an effort.  So I'd strongly favour
b), and yes, why not make bounce.pl use the original Received:
headers.  Good plan, I wish I had :)  If anyone wants a lesson in why
relying on Date: isn't a good idea on a reasonably sized (1000
members) list, visit 

http://www.mail-archive.com/sinister%40majordomo.net/

I'll sort it out when my list's up to date Jeff, I promise.

Paul

Reply via email to