On Fri, Feb 26, 2010 at 7:15 PM, Mark Sapiro <m...@msapiro.net> wrote: > On 2/26/2010 4:20 AM, Cedric Jeanneret wrote: >> On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <m...@msapiro.net> >> wrote: >> >>> Cedric Jeanneret wrote: >>>> >>>> I'm trying to create a xapian[1] indexer for our mailing list. As >>>> mailman is written in Python and there are python bindings for >>>> xapian, I guess I can maybe create a plugin for that. My first >>>> question is : is there already such a thing ? I searched on the >>>> net, but nothing appeared My second one : can we create a plugin >>>> for mailman, if so, where should I go to have some doc ? seems >>>> there's nothing in the wiki >>>> (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all) >>>> >>>> >>>> > Just to explain why I'd like to do that: we already have a xapian search > engine in here, indexing a fileserver, request tracker queues and > moinmoin wikis... so we'd like to aggregate all our stuff in one app for > searching. >>> >>> >>> This will be quite doable with Mailman 3 which is still in >>> development. >>> >>> There are problems trying to do this in Mailman 2.1.x. There is a >>> plugin capability of sorts in the form of custom handlers that can >>> be added to the incoming message processing pipeline. See the FAQ >>> at <http://wiki.list.org/x/l4A9>. However, archiving is >>> asynchronous with incoming message processing, so it is not >>> possible for a custom handler to know the URL that will ultimately >>> retrieve the message from the archive. >>> >>> A different approach which might be workable is to use the >>> PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If >>> you set >>> >>> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' >>> PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' >>> >>> in mm_cfg.py, then that script will be invoked do do the archiving. >>> The script in turn could invoke the standard pipermail archiving >>> process and then invoke xapian to index the archived message. >>> >> >> >> Hello again, >> >> Just one question : what do mlist, msg, msgdata stand for ? As I read >> I've to create my module and define a "process(mlist, msg, msgdata) >> inside it, I'd like to know what are those objects. I discovered that >> mlist stands for a Mailman.MailList.MailList('list-name'), but for >> the others, it's a bit hard to find... > > > Only custom handlers need to define process(mlist, msg, msgdata). That > is the entry point to the handler and three objects are passed > > mlist is the Mailman.MailList.MailList() instance for the current list > > msg is a Mailman.Message.Message() (subclass of email.Message.Message) > instance for the current message > > msgdata is a dictionary of the message metadata accumulated so far. > > The important thing is these are passed in as arguments to the handler > process() function. > > In your case, you are defining a module which is going to be invoked > like the following. > > Suppose that > > PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s' > > It will be invoked in a pipe similar to > > cat raw_message | /path/to/myarch.py HOST LIST > > i.e. the command string with %(hostname)s and %listname)s replaced by > the actual host name and list name of the list will be invoked and the > message piped to it. > > So, it could begin something like: > > #!python > import sys > sys.path.insert(0, 'path/to/mailman/bin') > # The above line can be skipped if myarch.py is in Mailman's > # bin directory. > import paths > > import email > from Mailman import MailList > from Mailman import Message > > msg = email.message_from_file(sys.stdin, Message.Message) > mlist = MailList.MailList(sys.argv[1], lock=True) > > > At this point, you have a list object (locked) and a message object. You > might think you could just do > > mlist.ArchiveMail(msg) > > to archive the mail to the listname.mbox file and the pipermail archive, > but that wouldn't quite work because that method would re-invoke the > external archiver. Also, you don't need to worry about the listname.mbox > file because the ArchiveMail() method already did that before invoking > the external archiver, so what you would need is > > from Mailman.Archiver import HyperArch > from cStringIO import StringIO > f = StringIO(str(msg)) > h = HyperArch.HyperArchive(mlist) > h.processUnixMailbox(f) > h.close() > f.close() > > Which is what the ArchiveMail() method would do. Now you still have the > mlist and msg objects, and you need to save and unlock the list at some > point > > mlist.Save() > mlist.Unlock() > > and the message is now in the pipermail archive and can be indexed. > > -- > Mark Sapiro <m...@msapiro.net> The highway is for gamblers, > San Francisco Bay Area, California better use your sense - B. Dylan > >
wow, thanks a lot, with all this I'll be able to do what I want! I'll post all my stuff as soon as I've done it, hopefully next week :). Thanks again. Best regards, C. ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org