[Mailman-Users] Retrieving individual messages from raw Mailman mboxes via http

2003-10-05 Thread Andy Sy
I am thinking of adapting a collapsible, outlinable, no-page
refresh, on-demand message-body load browser-based message
thread interface I made (see http://www.neotitans.com/page.gif)
to work with GNU Mailman (among other things) lists.
Ideally, I would like it to function as a 'www-interface
gateway' that works with all existing Mailman raw archives.
From what I've been researching, one would need some kind of
index into the raw mbox file, either a mail summary file format
or a database which would contain file seek pointers into
the raw mbox.
It would then use a ranged HTTP request to retrieve only
the particular message body it needs to display (would this
work?). Several issues arise which I'd be glad to have input
on from the experts on this list:
I. Which mbox index / mail summary file format to use?

The Mozilla .msf format looks like a strong candidate.
Does anyone have other suggestions?  Does Mailman maintain
such a mail summary file and is it publicly accessible
by default?
II. index / mail summary file performance and maintenance

Mozilla .msf files can be regenerated on the fly but
for a 100MB mailbox (Python-list's is 600MB+!), it already takes
fairly long (a few minutes).  Assuming index file corruption is
very rare, then this should not be a real problem.
III. index / mail summary file hosting issues

If an index/mail summary file is not available by default, and
such a www-interface gateway were to work with no additional work
on the list manager's part, then the index/mail summary file would
have to be generated by the machine hosting the gateway instead.
- What then, would be the mechanics of the (constant) remote reindexing
that would need to be done as new messages come in?  Would it be possible
to just constantly poll the size of the raw mbox and if it has changed,
to just reindex using data starting from the last retrieved file position?
- How often do list admins compact/expunge their raw archive mboxes?

Everytime they do, afaik, it would require the index / mail summary file
to be regenerated.
- Is it possible, then, for the www-interface gateway to automatically sense
if the remotely hosted raw archive mbox has been expunged/compacted?
- Also, how would the www-interface gateway machine know when its index
/ mail summary file has been corrupted?
A second possible approach would be for the www-interface gateway machine
to maintain its own copy of the raw archive and constantly rsync it with
the one maintained by the list admin.  This will probably only be feasible
if Mailman list admins provide rsync access to the raw mailbox archive.
- Would adding rsync serving of the raw mailbox to Mailman be a good idea?
(If it was in Mailman, it is more likely to be enabled by default).


--
=
reply-to: a n d y @ n e t f x p h . c o m
http://www.neotitans.com
Web and Software Development


--
Mailman-Users mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
This message was sent to: [EMAIL PROTECTED]
Unsubscribe or change your options at
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


[Mailman-Users] rsync'able Mailman archive

2003-10-05 Thread Andy Sy
For a huge list like http://www.libsdl.org/pipermail/sdl/ (104MB),
being able to download the raw archive is a blessing.  For those
newsgroups which have (hallelujah) Mailman versions, this means
no more fiddling with infernal newsreader behaviour when it comes
to retrieving big newsgroups.
Just:

1. Fetch huge archive efficiently your favorite download client.
2. Drop the mbox format file into your mail client directory (like
Mozilla Mail)
3. voila!  A configurable, collapsible-threadable, searchable,
completely complete (yeah!) local version of the whole list.
PROBLEM
===
is... how do you update your local mbox copy efficiently?  I,
can, for example, get a snapshot of the SDL archives as it
stands today, subscribe to the mailing list, and basically
ensure my local 'mirror' is both complete and up-to-date.
BUT... what if I want to stop subscribing for a while, and then
3 months down the road I want a complete version of the mailing
list again?  I would then have to download the (now larger)
full archive all over again.
The solution is to make the archive downloads rsync'able.
Which brings me to the topic of this post:
QUESTION 1
==
Is making the archive rsync'able a responsibility of the
list administrator or... wouldn't it be better to build
such support in Mailman.  If rsync access were provided
by default and thus widely available, it would allow those
people who are OC about getting complete lists to save time
and avoid wasting gobs of bandwidth by loading and reloading
the full archive multiple times.
QUESTION 2
==
Once I start subscribing to the list, my local mbox-format
copy of the list is now being updated locally and thus will
not be byte-for-byte identical to the Mailman archive.  If
I then rsync said mailbox with Mailman's, will the differences
then be 'repaired' efficiently (like rsync is supposed to do)?
A HACKISH WORKAROUND IN THE ABSENCE OF RSYNC SUPPORT

Deliberately truncate your local mbox copy to a size smaller
than it was when you FIRST updated it locally (you have to
remember what its size was at that point in time), and then
resume copy transferring from that point.


--
=
reply-to: a n d y @ n e t f x p h . c o m
http://www.neotitans.com
Web and Software Development




--
Mailman-Users mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
This message was sent to: [EMAIL PROTECTED]
Unsubscribe or change your options at
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


[Mailman-Users] A www interface to Mailman mbox archives

2003-10-06 Thread Andy Sy
Hi all, I would like to interface a DHTML-based, collapsible
threaded message interface I've developed to GNU Mailman
lists.
THe key technique relies on building an index (ala mail summary
files) of the raw mbox and retrieving messages using an HTTP ranged
request from it on an as-needed basis.
Question for list admins out there is:

Do you find yourselves expunging / compacting the raw mbox?  And
if so, how often?
Once I index the raw mbox, can I count on the present messages in
it to remain in the same positions and only have to check / index
on the new messages appended to the end?
--
=
reply-to: a n d y @ n e t f x p h . c o m
http://www.neotitans.com
Web and Software Development
"Bring back reply-to munging.  People are sick of getting duplicate
messages and I'm sick of removing CCs each time I do reply-all to a
list message! And, NO, changing email clients is NOT the answer."


--
Mailman-Users mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
This message was sent to: [EMAIL PROTECTED]
Unsubscribe or change your options at
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org