On Fri, 26 Feb 2010 10:15:13 -0800 Mark Sapiro <m...@msapiro.net> wrote:
> On 2/26/2010 4:20 AM, Cedric Jeanneret wrote: > > On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro <m...@msapiro.net> > > wrote: > > > >> Cedric Jeanneret wrote: > >>> > >>> I'm trying to create a xapian[1] indexer for our mailing list. As > >>> mailman is written in Python and there are python bindings for > >>> xapian, I guess I can maybe create a plugin for that. My first > >>> question is : is there already such a thing ? I searched on the > >>> net, but nothing appeared My second one : can we create a plugin > >>> for mailman, if so, where should I go to have some doc ? seems > >>> there's nothing in the wiki > >>> (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=plugin&searchQuery.spaceKey=conf_all) > >>> > >>> > >>> > Just to explain why I'd like to do that: we already have a xapian search > engine in here, indexing a fileserver, request tracker queues and > moinmoin wikis... so we'd like to aggregate all our stuff in one app for > searching. > >> > >> > >> This will be quite doable with Mailman 3 which is still in > >> development. > >> > >> There are problems trying to do this in Mailman 2.1.x. There is a > >> plugin capability of sorts in the form of custom handlers that can > >> be added to the incoming message processing pipeline. See the FAQ > >> at <http://wiki.list.org/x/l4A9>. However, archiving is > >> asynchronous with incoming message processing, so it is not > >> possible for a custom handler to know the URL that will ultimately > >> retrieve the message from the archive. > >> > >> A different approach which might be workable is to use the > >> PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If > >> you set > >> > >> PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' > >> PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' > >> > >> in mm_cfg.py, then that script will be invoked do do the archiving. > >> The script in turn could invoke the standard pipermail archiving > >> process and then invoke xapian to index the archived message. > >> > > > > > > Hello again, > > > > Just one question : what do mlist, msg, msgdata stand for ? As I read > > I've to create my module and define a "process(mlist, msg, msgdata) > > inside it, I'd like to know what are those objects. I discovered that > > mlist stands for a Mailman.MailList.MailList('list-name'), but for > > the others, it's a bit hard to find... > > > Only custom handlers need to define process(mlist, msg, msgdata). That > is the entry point to the handler and three objects are passed > > mlist is the Mailman.MailList.MailList() instance for the current list > > msg is a Mailman.Message.Message() (subclass of email.Message.Message) > instance for the current message > > msgdata is a dictionary of the message metadata accumulated so far. > > The important thing is these are passed in as arguments to the handler > process() function. > > In your case, you are defining a module which is going to be invoked > like the following. > > Suppose that > > PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s' > > It will be invoked in a pipe similar to > > cat raw_message | /path/to/myarch.py HOST LIST > > i.e. the command string with %(hostname)s and %listname)s replaced by > the actual host name and list name of the list will be invoked and the > message piped to it. > > So, it could begin something like: > > #!python > import sys > sys.path.insert(0, 'path/to/mailman/bin') > # The above line can be skipped if myarch.py is in Mailman's > # bin directory. > import paths > > import email > from Mailman import MailList > from Mailman import Message > > msg = email.message_from_file(sys.stdin, Message.Message) > mlist = MailList.MailList(sys.argv[1], lock=True) > > > At this point, you have a list object (locked) and a message object. You > might think you could just do > > mlist.ArchiveMail(msg) > > to archive the mail to the listname.mbox file and the pipermail archive, > but that wouldn't quite work because that method would re-invoke the > external archiver. Also, you don't need to worry about the listname.mbox > file because the ArchiveMail() method already did that before invoking > the external archiver, so what you would need is > > from Mailman.Archiver import HyperArch > from cStringIO import StringIO > f = StringIO(str(msg)) > h = HyperArch.HyperArchive(mlist) > h.processUnixMailbox(f) > h.close() > f.close() > > Which is what the ArchiveMail() method would do. Now you still have the > mlist and msg objects, and you need to save and unlock the list at some > point > > mlist.Save() > mlist.Unlock() > > and the message is now in the pipermail archive and can be indexed. > Hello again, I'm having some troubles with my code. According to what Mark said, I've done this : #!/usr/bin/env python import sys sys.path.insert(0,'/usr/lib/mailman') import syslog syslog.syslog('begin script') import email from Mailman import MailList from Mailman import Message ## archive part from Mailman.Archiver import HyperArch from cStringIO import StringIO maillist = sys.argv[2] hostname = sys.argv[1] msg = email.message_from_file(sys.stdin, Message.Message) syslog.syslog(maillist) mlist = MailList.MailList(maillist, lock=True) syslog.syslog('processing archiver') ## let archive it f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() mlist.Save() mlist.Unlock() mlist.ArchiveMail(msg) syslog.syslog('processing indexer') ### coming soon syslog.syslog('exiting - all ok') sys.exit(0) "syslog" is for debug purpose only. And if I send an email on my ML, I have this kind of error: Mar 02 12:38:33 2010 (28380) toto.lock lifetime has expired, breaking Mar 02 12:38:33 2010 (28380) File "/var/lib/mailman/scripts/driver", line 250, in <module> Mar 02 12:38:33 2010 (28380) run_main() Mar 02 12:38:33 2010 (28380) File "/var/lib/mailman/scripts/driver", line 110, in run_main Mar 02 12:38:33 2010 (28380) main() Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/Cgi/admin.py", line 167, in main Mar 02 12:38:33 2010 (28380) mlist.Lock() Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock Mar 02 12:38:33 2010 (28380) self.__lock.lock(timeout) Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/LockFile.py", line 306, in lock Mar 02 12:38:33 2010 (28380) important=True) Mar 02 12:38:33 2010 (28380) File "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog Mar 02 12:38:33 2010 (28380) traceback.print_stack(file=logf) This block is spamming my /var/log/mailman/locks It seems I have a problem with the lockfile... Any idea ? Thank you! -- Cédric Jeanneret | System Administrator 021 619 10 32 | Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL
signature.asc
Description: PGP signature
------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org