Re: [Mailman-Users] Indexing mail right after delivery
Done for launchpad. thanks again! On Mon, Mar 15, 2010 at 5:40 PM, Mark Sapiro m...@msapiro.net wrote: Cedric Jeanneret wrote: Maybe we should delete my bug on launchpad, or directly link it to your FAQ page ? I just added my code in the function, and now it indexes, and archives correctly. I suggest you just delete the two existing attachments and attach your current code with a note that it is based on the template in the FAQ. That way the xappy/Xapian code will be available there if others wish to use it. -- Mark Sapiro m...@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Sun, 14 Mar 2010 17:38:16 -0700 Mark Sapiro m...@msapiro.net wrote: To follow up on this thread, there is now a FAQ at http://wiki.list.org/x/RAKJ which contains an attached template, Ext_Arch.py, which can be used as an external archiver and which will add the message to the pipermail archive, and then call a stub function with arguments of the list name, host name, the URL to the just archived message, the file system path to the just archived message and the message object. The stub can be coded to call a search indexer or do other things one may wish to do with the archived message. Hello Mark, It just works like a magic!. Thank you so much! Maybe we should delete my bug on launchpad, or directly link it to your FAQ page ? I just added my code in the function, and now it indexes, and archives correctly. Thanks again! See you C. -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
Cedric Jeanneret wrote: Maybe we should delete my bug on launchpad, or directly link it to your FAQ page ? I just added my code in the function, and now it indexes, and archives correctly. I suggest you just delete the two existing attachments and attach your current code with a note that it is based on the template in the FAQ. That way the xappy/Xapian code will be available there if others wish to use it. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
To follow up on this thread, there is now a FAQ at http://wiki.list.org/x/RAKJ which contains an attached template, Ext_Arch.py, which can be used as an external archiver and which will add the message to the pipermail archive, and then call a stub function with arguments of the list name, host name, the URL to the just archived message, the file system path to the just archived message and the message object. The stub can be coded to call a search indexer or do other things one may wish to do with the archived message. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Wed, 03 Mar 2010 10:04:31 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/3/2010 9:20 AM, Cédric Jeanneret wrote: Maybe a python version? What is really strange is that it works inside the archiver I tried to NOT use email.message_from_file (so use directly StringIO on sys.stdin), and it worked fine. In fact, the error was that Message doesn't have tell() method... Which says you are passing a Message object, not a StringIO or file object. I considered at one point just passing sys.stdin directly, but that won't work because sys.stdin does not have seek() or tell() methods. Another error was really annoying : ALL worked. almost. I couldn't do my mlist.Save(), as there was an error for the lockfile. I did : mlist = MailList.MailList('toto', lock=False) # other code mlist.Save() Right. I overlooked the fact that you can't Save() an unlocked list. But, I don't think you need to. I don't think the archiver actually updates your list instance in it's processing, so you should be OK if you just remove the Save() from your code. - crashed. After poking into MailList code, I saw that it refreshes the lockfile. Commenting out this line made it work again more or less : message was in mbox, but wasn't in pipermail archives Don't do that. It won't work anyway because the locked list object in ArchRunner will be saved after you're done and will undo any changes you made to your list object. But, as I say, you shouldn't need to save your list object. It is only passed to the HyperArch.HyperArchive() constructor so the archiver knows where to find the archive. I don't think it is updated. Poking on the Net, I found this post http://www.mail-archive.com/mailman-users@python.org/msg47499.html you answered some months (well, years) ago. I tried this way : applying the patch, so that it uses mailman internal archiver, and it calls my indexer right after. That's not really clean, it's not really a portable way, but it works. The fact that I have to patch a file from mailman package annoy me a bit, but... I didn't have any success with the ways you showed me :( To be honnest, maybe I'll try to put a handler (like XapianIndexer.py) for this. As I saw how to debug my scripts (thank you for the tip), I guess it would be the best way, instead of patching a code (which will be overriden on the next update). Or maybe there's a variable in mm_config (or defaults) which tell mailman to call a script after archiving ? I didn't see such a thing, I guess that's the role a the GLOBAL_PIPELINE and its handlers chain... As I tried to point out in my initial reply http://mail.python.org/pipermail/mailman-users/2010-February/068900.html, that won't work. The pipeline includes ToArchive which only queues the message in the archive queue for ArchRunner. Then IncomingRunner continues processing the pipeline. When it gets to your handler, there's no guarantee that ArchRunner has yet archived the message so how do you index something that may not yet even be there. We were almost there with the external archiver method. Let's try to make that work. What do you have now in the external archiver code and in the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER strings and what is the problem? Hello again ! I think I found what's the problem is : the script works now, but as I write my own archiver, it doesn't do the pipermail part (i.e. update mails in archive)... I thought that this code : mlist = MailList.MailList(maillist, lock=False) msg = email.message_from_file(sys.stdin, Message.Message) f = StringIO(str(sys.stdin)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) f.close() did all, but after reading a bit of code, it doesn't exactly. It saves to .mbox file, right ? I tried to find where it does the pipermail stuff, but it's a bit complicated [I'm not so at ease with Python]. Any clue ? Thank you -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Wed, 03 Mar 2010 10:04:31 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/3/2010 9:20 AM, Cédric Jeanneret wrote: Maybe a python version? What is really strange is that it works inside the archiver I tried to NOT use email.message_from_file (so use directly StringIO on sys.stdin), and it worked fine. In fact, the error was that Message doesn't have tell() method... Which says you are passing a Message object, not a StringIO or file object. I considered at one point just passing sys.stdin directly, but that won't work because sys.stdin does not have seek() or tell() methods. Another error was really annoying : ALL worked. almost. I couldn't do my mlist.Save(), as there was an error for the lockfile. I did : mlist = MailList.MailList('toto', lock=False) # other code mlist.Save() Right. I overlooked the fact that you can't Save() an unlocked list. But, I don't think you need to. I don't think the archiver actually updates your list instance in it's processing, so you should be OK if you just remove the Save() from your code. - crashed. After poking into MailList code, I saw that it refreshes the lockfile. Commenting out this line made it work again more or less : message was in mbox, but wasn't in pipermail archives Don't do that. It won't work anyway because the locked list object in ArchRunner will be saved after you're done and will undo any changes you made to your list object. But, as I say, you shouldn't need to save your list object. It is only passed to the HyperArch.HyperArchive() constructor so the archiver knows where to find the archive. I don't think it is updated. Poking on the Net, I found this post http://www.mail-archive.com/mailman-users@python.org/msg47499.html you answered some months (well, years) ago. I tried this way : applying the patch, so that it uses mailman internal archiver, and it calls my indexer right after. That's not really clean, it's not really a portable way, but it works. The fact that I have to patch a file from mailman package annoy me a bit, but... I didn't have any success with the ways you showed me :( To be honnest, maybe I'll try to put a handler (like XapianIndexer.py) for this. As I saw how to debug my scripts (thank you for the tip), I guess it would be the best way, instead of patching a code (which will be overriden on the next update). Or maybe there's a variable in mm_config (or defaults) which tell mailman to call a script after archiving ? I didn't see such a thing, I guess that's the role a the GLOBAL_PIPELINE and its handlers chain... As I tried to point out in my initial reply http://mail.python.org/pipermail/mailman-users/2010-February/068900.html, that won't work. The pipeline includes ToArchive which only queues the message in the archive queue for ArchRunner. Then IncomingRunner continues processing the pipeline. When it gets to your handler, there's no guarantee that ArchRunner has yet archived the message so how do you index something that may not yet even be there. We were almost there with the external archiver method. Let's try to make that work. What do you have now in the external archiver code and in the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER strings and what is the problem? uho, found it !! mailman/bin/arch toto I guess that's all :)) -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 3/4/2010 4:23 AM, Cedric Jeanneret wrote: I think I found what's the problem is : the script works now, but as I write my own archiver, it doesn't do the pipermail part (i.e. update mails in archive)... I thought that this code : mlist = MailList.MailList(maillist, lock=False) msg = email.message_from_file(sys.stdin, Message.Message) f = StringIO(str(sys.stdin)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) f.close() did all, but after reading a bit of code, it doesn't exactly. It saves to .mbox file, right ? No. It doesn't save to the .mbox file. If you look at the ArchiveMail() method in Mailman/Archivers/Archiver.py. it first saves to the .mbox by doing if mm_cfg.ARCHIVE_TO_MBOX in (1, 2): self.__archive_to_mbox(msg) Then it either calls the external archiver or executes essentially the above to archive the mail in the pipermail archive. What you are missing is h.close() and that's why it doesn't work. I tried to find where it does the pipermail stuff, but it's a bit complicated [I'm not so at ease with Python]. Yes, the archiver is very convoluted because classes are subclassed and methods overridden all over. Don't feel bad. I've been looking at it for years and still only barely understand it. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 3/4/2010 4:46 AM, Cedric Jeanneret wrote: uho, found it !! mailman/bin/arch toto I guess that's all :)) You may or may not be able to use bin/arch, but you can't use it in conjunction with an external archiver because of list locking. If you call bin/arch from your external archiver and wait for it to return, you will have a deadlock, and if you don't wait, it won't run until after your external archiver finishes. I.e., an external archiver command like '|/path/bin/arch $(listname)s;/path/myscript.py $(listname)s' creates a deadlock, and one like '|/path/bin/arch $(listname)s/path/myscript.py $(listname)s' doesn't work because myscript.py has to complete before bin/arch can obtain the list lock. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Thu, 04 Mar 2010 06:49:54 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/4/2010 4:23 AM, Cedric Jeanneret wrote: I think I found what's the problem is : the script works now, but as I write my own archiver, it doesn't do the pipermail part (i.e. update mails in archive)... I thought that this code : mlist = MailList.MailList(maillist, lock=False) msg = email.message_from_file(sys.stdin, Message.Message) f = StringIO(str(sys.stdin)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) f.close() did all, but after reading a bit of code, it doesn't exactly. It saves to .mbox file, right ? No. It doesn't save to the .mbox file. If you look at the ArchiveMail() method in Mailman/Archivers/Archiver.py. it first saves to the .mbox by doing if mm_cfg.ARCHIVE_TO_MBOX in (1, 2): self.__archive_to_mbox(msg) Then it either calls the external archiver or executes essentially the above to archive the mail in the pipermail archive. What you are missing is h.close() and that's why it doesn't work. I tried to find where it does the pipermail stuff, but it's a bit complicated [I'm not so at ease with Python]. Yes, the archiver is very convoluted because classes are subclassed and methods overridden all over. Don't feel bad. I've been looking at it for years and still only barely understand it. hmmm, I use the h.close() a bit after (I catche its latest ID so that I ca build the direct URL for my indexer). But for now, I guess I'm done. I've opened a bug (didn't figure where I could put my stuff) on launchpad: https://bugs.launchpad.net/mailman/+bug/531942 It contains my scripts, and some informations on how to use them. Indeed, arch script uses locks. I copied it, removed the lock stuff, and used this version. All work fine now. I'm happy I could understand a bit (well... very little bit) how mailman works. Thanks again ! -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 3/4/2010 7:10 AM, Cedric Jeanneret wrote: hmmm, I use the h.close() a bit after (I catche its latest ID so that I ca build the direct URL for my indexer). But for now, I guess I'm done. I've opened a bug (didn't figure where I could put my stuff) on launchpad: https://bugs.launchpad.net/mailman/+bug/531942 It contains my scripts, and some informations on how to use them. I've seen your bug in the tracker. It's too bad Launchpad calls everything a bug, but that's the right place. Indeed, arch script uses locks. I copied it, removed the lock stuff, and used this version. All work fine now. I will have some comments after I look at this more. I think there is redundant stuff, but I'll comment further after I look in detail. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Tue, 02 Mar 2010 11:34:25 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/2/2010 3:41 AM, Cedric Jeanneret wrote: On Fri, 26 Feb 2010 10:15:13 -0800 Mark Sapiro m...@msapiro.net wrote: At this point, you have a list object (locked) and a message object. You might think you could just do mlist.ArchiveMail(msg) to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point mlist.Save() mlist.Unlock() and the message is now in the pipermail archive and can be indexed. Hello again, I'm having some troubles with my code. According to what Mark said, I've done this : #!/usr/bin/env python import sys sys.path.insert(0,'/usr/lib/mailman') import syslog syslog.syslog('begin script') import email from Mailman import MailList from Mailman import Message ## archive part from Mailman.Archiver import HyperArch from cStringIO import StringIO maillist = sys.argv[2] hostname = sys.argv[1] msg = email.message_from_file(sys.stdin, Message.Message) syslog.syslog(maillist) mlist = MailList.MailList(maillist, lock=True) syslog.syslog('processing archiver') ## let archive it f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() mlist.Save() mlist.Unlock() mlist.ArchiveMail(msg) Here is one problem. Remove the above line. As I tried to say above you can't do this. The lines above from f = StringIO(str(msg)) through f.close() archive the message. When you call mlist.ArchiveMail(msg), it reinvokes your external archiver in an endless loop. You need to remove the mlist.ArchiveMail(msg). The locking problem is something else. The external archiver is called with the list locked, thus when we try to instantiate the list 'locked', we have a deadlock. Thus, you never saw the loop because of the deadlock. The good news is we don't have to pass a locked list instance to HyperArch.HyperArchive() as it uses a special archiver lock. So, replace mlist = MailList.MailList(maillist, lock=True) with mlist = MailList.MailList(maillist, lock=False) and remove the mlist.Unlock() as your instance isn't locked, and ArchRunner will unlock its list instance when you exit. syslog.syslog('processing indexer') ### coming soon syslog.syslog('exiting - all ok') sys.exit(0) syslog is for debug purpose only. And if I send an email on my ML, I have this kind of error: Mar 02 12:38:33 2010 (28380) toto.lock lifetime has expired, breaking Hmm, it seems it crashes in pipermail.py, in function processUnixMailbox: we have a pos = input.tell() on line 564, but unfortunately input does NOT have any tell() method... It returns a 41 status. -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 3/2/2010 11:02 PM, Cedric Jeanneret wrote: Woops, right. it was commented out in my code. For now, I'm pocking around with some other problems, such as my external archiver returns a non-zero status. It seems to crash with the h.processUnixMailbox(f) Is there any way to have a backtrace of python errors (i.e. testing it through the shell)? I guess I can write a file with all email content, included headers, and pipe it in my file. Right ? There are several choices. You could try adding 'filename' to your external archiver command string. That will probably work You can do as you suggest above. You can replace your import syslog with from Mailman.Logging.Syslog import syslog from Mailman.Logging.Utils import LogStdErr and add LogStdErr('debug', 'mailmanctl', manual_reprime=0) and change your syslog.syslog('debug text') statements to syslog('debug', 'debug text') This will write all stderr output plus your 'debug text' entries to a log named debug in Mailman's logs directory. (You can name the log anything you want. It will be created if it doesn't exist.) I see you've gotten further. I'll respond to that post. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 3/3/2010 12:57 AM, Cedric Jeanneret wrote: On Tue, 02 Mar 2010 11:34:25 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/2/2010 3:41 AM, Cedric Jeanneret wrote: [...] from cStringIO import StringIO [...] f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) [...] Hmm, it seems it crashes in pipermail.py, in function processUnixMailbox: we have a pos = input.tell() on line 564, but unfortunately input does NOT have any tell() method... It returns a 41 status. Something is strange. The input object in 'pos = input.tell()' is the StringIO instance you passed as 'f', and StringIO objects do have a tell method. Also, the above code snippet is exactly what the builtin archiver uses, and I tested it and it worked for me. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Wed, Mar 3, 2010 at 4:44 PM, Mark Sapiro m...@msapiro.net wrote: On 3/3/2010 12:57 AM, Cedric Jeanneret wrote: On Tue, 02 Mar 2010 11:34:25 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/2/2010 3:41 AM, Cedric Jeanneret wrote: [...] from cStringIO import StringIO [...] f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) [...] Hmm, it seems it crashes in pipermail.py, in function processUnixMailbox: we have a pos = input.tell() on line 564, but unfortunately input does NOT have any tell() method... It returns a 41 status. Something is strange. The input object in 'pos = input.tell()' is the StringIO instance you passed as 'f', and StringIO objects do have a tell method. Also, the above code snippet is exactly what the builtin archiver uses, and I tested it and it worked for me. -- Mark Sapiro m...@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan Maybe a python version? What is really strange is that it works inside the archiver I tried to NOT use email.message_from_file (so use directly StringIO on sys.stdin), and it worked fine. In fact, the error was that Message doesn't have tell() method... Another error was really annoying : ALL worked. almost. I couldn't do my mlist.Save(), as there was an error for the lockfile. I did : mlist = MailList.MailList('toto', lock=False) # other code mlist.Save() - crashed. After poking into MailList code, I saw that it refreshes the lockfile. Commenting out this line made it work again more or less : message was in mbox, but wasn't in pipermail archives Poking on the Net, I found this post http://www.mail-archive.com/mailman-users@python.org/msg47499.html you answered some months (well, years) ago. I tried this way : applying the patch, so that it uses mailman internal archiver, and it calls my indexer right after. That's not really clean, it's not really a portable way, but it works. The fact that I have to patch a file from mailman package annoy me a bit, but... I didn't have any success with the ways you showed me :( To be honnest, maybe I'll try to put a handler (like XapianIndexer.py) for this. As I saw how to debug my scripts (thank you for the tip), I guess it would be the best way, instead of patching a code (which will be overriden on the next update). Or maybe there's a variable in mm_config (or defaults) which tell mailman to call a script after archiving ? I didn't see such a thing, I guess that's the role a the GLOBAL_PIPELINE and its handlers chain... Thank you for the time you spend on my problem. Best regards, C. -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 3/3/2010 9:20 AM, Cédric Jeanneret wrote: Maybe a python version? What is really strange is that it works inside the archiver I tried to NOT use email.message_from_file (so use directly StringIO on sys.stdin), and it worked fine. In fact, the error was that Message doesn't have tell() method... Which says you are passing a Message object, not a StringIO or file object. I considered at one point just passing sys.stdin directly, but that won't work because sys.stdin does not have seek() or tell() methods. Another error was really annoying : ALL worked. almost. I couldn't do my mlist.Save(), as there was an error for the lockfile. I did : mlist = MailList.MailList('toto', lock=False) # other code mlist.Save() Right. I overlooked the fact that you can't Save() an unlocked list. But, I don't think you need to. I don't think the archiver actually updates your list instance in it's processing, so you should be OK if you just remove the Save() from your code. - crashed. After poking into MailList code, I saw that it refreshes the lockfile. Commenting out this line made it work again more or less : message was in mbox, but wasn't in pipermail archives Don't do that. It won't work anyway because the locked list object in ArchRunner will be saved after you're done and will undo any changes you made to your list object. But, as I say, you shouldn't need to save your list object. It is only passed to the HyperArch.HyperArchive() constructor so the archiver knows where to find the archive. I don't think it is updated. Poking on the Net, I found this post http://www.mail-archive.com/mailman-users@python.org/msg47499.html you answered some months (well, years) ago. I tried this way : applying the patch, so that it uses mailman internal archiver, and it calls my indexer right after. That's not really clean, it's not really a portable way, but it works. The fact that I have to patch a file from mailman package annoy me a bit, but... I didn't have any success with the ways you showed me :( To be honnest, maybe I'll try to put a handler (like XapianIndexer.py) for this. As I saw how to debug my scripts (thank you for the tip), I guess it would be the best way, instead of patching a code (which will be overriden on the next update). Or maybe there's a variable in mm_config (or defaults) which tell mailman to call a script after archiving ? I didn't see such a thing, I guess that's the role a the GLOBAL_PIPELINE and its handlers chain... As I tried to point out in my initial reply http://mail.python.org/pipermail/mailman-users/2010-February/068900.html, that won't work. The pipeline includes ToArchive which only queues the message in the archive queue for ArchRunner. Then IncomingRunner continues processing the pipeline. When it gets to your handler, there's no guarantee that ArchRunner has yet archived the message so how do you index something that may not yet even be there. We were almost there with the external archiver method. Let's try to make that work. What do you have now in the external archiver code and in the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER strings and what is the problem? -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Wed, 03 Mar 2010 10:04:31 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/3/2010 9:20 AM, Cédric Jeanneret wrote: Maybe a python version? What is really strange is that it works inside the archiver I tried to NOT use email.message_from_file (so use directly StringIO on sys.stdin), and it worked fine. In fact, the error was that Message doesn't have tell() method... Which says you are passing a Message object, not a StringIO or file object. I considered at one point just passing sys.stdin directly, but that won't work because sys.stdin does not have seek() or tell() methods. Another error was really annoying : ALL worked. almost. I couldn't do my mlist.Save(), as there was an error for the lockfile. I did : mlist = MailList.MailList('toto', lock=False) # other code mlist.Save() Right. I overlooked the fact that you can't Save() an unlocked list. But, I don't think you need to. I don't think the archiver actually updates your list instance in it's processing, so you should be OK if you just remove the Save() from your code. - crashed. After poking into MailList code, I saw that it refreshes the lockfile. Commenting out this line made it work again more or less : message was in mbox, but wasn't in pipermail archives Don't do that. It won't work anyway because the locked list object in ArchRunner will be saved after you're done and will undo any changes you made to your list object. But, as I say, you shouldn't need to save your list object. It is only passed to the HyperArch.HyperArchive() constructor so the archiver knows where to find the archive. I don't think it is updated. Poking on the Net, I found this post http://www.mail-archive.com/mailman-users@python.org/msg47499.html you answered some months (well, years) ago. I tried this way : applying the patch, so that it uses mailman internal archiver, and it calls my indexer right after. That's not really clean, it's not really a portable way, but it works. The fact that I have to patch a file from mailman package annoy me a bit, but... I didn't have any success with the ways you showed me :( To be honnest, maybe I'll try to put a handler (like XapianIndexer.py) for this. As I saw how to debug my scripts (thank you for the tip), I guess it would be the best way, instead of patching a code (which will be overriden on the next update). Or maybe there's a variable in mm_config (or defaults) which tell mailman to call a script after archiving ? I didn't see such a thing, I guess that's the role a the GLOBAL_PIPELINE and its handlers chain... As I tried to point out in my initial reply http://mail.python.org/pipermail/mailman-users/2010-February/068900.html, that won't work. The pipeline includes ToArchive which only queues the message in the archive queue for ArchRunner. Then IncomingRunner continues processing the pipeline. When it gets to your handler, there's no guarantee that ArchRunner has yet archived the message so how do you index something that may not yet even be there. We were almost there with the external archiver method. Let's try to make that work. What do you have now in the external archiver code and in the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER strings and what is the problem? Hello again, First of all, I want to thank you for the time you spend on my case. I really appreciate. Now, for my code: I attached the latest (buggy) version of my archive-and-index.py script. I've done a rollback to the way you told me, so that we won't go in all directions. You'll find anotther attachment : debug file I added in this way : PUBLIC_EXTERNAL_ARCHIVER = '/root/archive-and-index.py %(hostname)s %(listname)s /var/log/mailman/archiver' It seems that the Message.Message stays, even if we create a new StringIO variable... weird. Just in case : python --version Python 2.5.2 Maybe there's a problem with this version... ? If so, it will be a little problem, as it's the lenny version. I'll keep on trying, and keep you updated as soon as I have some new things. Thanks again. Best regards, C. -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Fri, 26 Feb 2010 10:15:13 -0800 Mark Sapiro m...@msapiro.net wrote: On 2/26/2010 4:20 AM, Cedric Jeanneret wrote: On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro m...@msapiro.net wrote: Cedric Jeanneret wrote: I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=pluginsearchQuery.spaceKey=conf_all) Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching. This will be quite doable with Mailman 3 which is still in development. There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at http://wiki.list.org/x/l4A9. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive. A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message. Hello again, Just one question : what do mlist, msg, msgdata stand for ? As I read I've to create my module and define a process(mlist, msg, msgdata) inside it, I'd like to know what are those objects. I discovered that mlist stands for a Mailman.MailList.MailList('list-name'), but for the others, it's a bit hard to find... Only custom handlers need to define process(mlist, msg, msgdata). That is the entry point to the handler and three objects are passed mlist is the Mailman.MailList.MailList() instance for the current list msg is a Mailman.Message.Message() (subclass of email.Message.Message) instance for the current message msgdata is a dictionary of the message metadata accumulated so far. The important thing is these are passed in as arguments to the handler process() function. In your case, you are defining a module which is going to be invoked like the following. Suppose that PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s' It will be invoked in a pipe similar to cat raw_message | /path/to/myarch.py HOST LIST i.e. the command string with %(hostname)s and %listname)s replaced by the actual host name and list name of the list will be invoked and the message piped to it. So, it could begin something like: #!python import sys sys.path.insert(0, 'path/to/mailman/bin') # The above line can be skipped if myarch.py is in Mailman's # bin directory. import paths import email from Mailman import MailList from Mailman import Message msg = email.message_from_file(sys.stdin, Message.Message) mlist = MailList.MailList(sys.argv[1], lock=True) At this point, you have a list object (locked) and a message object. You might think you could just do mlist.ArchiveMail(msg) to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point mlist.Save() mlist.Unlock() and the message is now in the pipermail archive and can be indexed. Hello again, I'm having some troubles with my code. According to what Mark said, I've done this : #!/usr/bin/env python import sys sys.path.insert(0,'/usr/lib/mailman') import syslog syslog.syslog('begin script') import email from Mailman import MailList from Mailman import Message ## archive part from Mailman.Archiver import HyperArch from cStringIO import StringIO maillist = sys.argv[2] hostname = sys.argv[1] msg =
Re: [Mailman-Users] Indexing mail right after delivery
On 3/2/2010 3:41 AM, Cedric Jeanneret wrote: On Fri, 26 Feb 2010 10:15:13 -0800 Mark Sapiro m...@msapiro.net wrote: At this point, you have a list object (locked) and a message object. You might think you could just do mlist.ArchiveMail(msg) to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point mlist.Save() mlist.Unlock() and the message is now in the pipermail archive and can be indexed. Hello again, I'm having some troubles with my code. According to what Mark said, I've done this : #!/usr/bin/env python import sys sys.path.insert(0,'/usr/lib/mailman') import syslog syslog.syslog('begin script') import email from Mailman import MailList from Mailman import Message ## archive part from Mailman.Archiver import HyperArch from cStringIO import StringIO maillist = sys.argv[2] hostname = sys.argv[1] msg = email.message_from_file(sys.stdin, Message.Message) syslog.syslog(maillist) mlist = MailList.MailList(maillist, lock=True) syslog.syslog('processing archiver') ## let archive it f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() mlist.Save() mlist.Unlock() mlist.ArchiveMail(msg) Here is one problem. Remove the above line. As I tried to say above you can't do this. The lines above from f = StringIO(str(msg)) through f.close() archive the message. When you call mlist.ArchiveMail(msg), it reinvokes your external archiver in an endless loop. You need to remove the mlist.ArchiveMail(msg). The locking problem is something else. The external archiver is called with the list locked, thus when we try to instantiate the list 'locked', we have a deadlock. Thus, you never saw the loop because of the deadlock. The good news is we don't have to pass a locked list instance to HyperArch.HyperArchive() as it uses a special archiver lock. So, replace mlist = MailList.MailList(maillist, lock=True) with mlist = MailList.MailList(maillist, lock=False) and remove the mlist.Unlock() as your instance isn't locked, and ArchRunner will unlock its list instance when you exit. syslog.syslog('processing indexer') ### coming soon syslog.syslog('exiting - all ok') sys.exit(0) syslog is for debug purpose only. And if I send an email on my ML, I have this kind of error: Mar 02 12:38:33 2010 (28380) toto.lock lifetime has expired, breaking -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Tue, 02 Mar 2010 11:34:25 -0800 Mark Sapiro m...@msapiro.net wrote: On 3/2/2010 3:41 AM, Cedric Jeanneret wrote: On Fri, 26 Feb 2010 10:15:13 -0800 Mark Sapiro m...@msapiro.net wrote: At this point, you have a list object (locked) and a message object. You might think you could just do mlist.ArchiveMail(msg) to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point mlist.Save() mlist.Unlock() and the message is now in the pipermail archive and can be indexed. Hello again, I'm having some troubles with my code. According to what Mark said, I've done this : #!/usr/bin/env python import sys sys.path.insert(0,'/usr/lib/mailman') import syslog syslog.syslog('begin script') import email from Mailman import MailList from Mailman import Message ## archive part from Mailman.Archiver import HyperArch from cStringIO import StringIO maillist = sys.argv[2] hostname = sys.argv[1] msg = email.message_from_file(sys.stdin, Message.Message) syslog.syslog(maillist) mlist = MailList.MailList(maillist, lock=True) syslog.syslog('processing archiver') ## let archive it f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() mlist.Save() mlist.Unlock() mlist.ArchiveMail(msg) Here is one problem. Remove the above line. As I tried to say above you can't do this. The lines above from f = StringIO(str(msg)) through f.close() archive the message. When you call mlist.ArchiveMail(msg), it reinvokes your external archiver in an endless loop. You need to remove the mlist.ArchiveMail(msg). The locking problem is something else. The external archiver is called with the list locked, thus when we try to instantiate the list 'locked', we have a deadlock. Thus, you never saw the loop because of the deadlock. The good news is we don't have to pass a locked list instance to HyperArch.HyperArchive() as it uses a special archiver lock. So, replace mlist = MailList.MailList(maillist, lock=True) with mlist = MailList.MailList(maillist, lock=False) and remove the mlist.Unlock() as your instance isn't locked, and ArchRunner will unlock its list instance when you exit. syslog.syslog('processing indexer') ### coming soon syslog.syslog('exiting - all ok') sys.exit(0) syslog is for debug purpose only. And if I send an email on my ML, I have this kind of error: Mar 02 12:38:33 2010 (28380) toto.lock lifetime has expired, breaking Woops, right. it was commented out in my code. For now, I'm pocking around with some other problems, such as my external archiver returns a non-zero status. It seems to crash with the h.processUnixMailbox(f) Is there any way to have a backtrace of python errors (i.e. testing it through the shell)? I guess I can write a file with all email content, included headers, and pipe it in my file. Right ? Thank you! C. -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Fri, Feb 26, 2010 at 7:15 PM, Mark Sapiro m...@msapiro.net wrote: On 2/26/2010 4:20 AM, Cedric Jeanneret wrote: On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro m...@msapiro.net wrote: Cedric Jeanneret wrote: I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=pluginsearchQuery.spaceKey=conf_all) Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching. This will be quite doable with Mailman 3 which is still in development. There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at http://wiki.list.org/x/l4A9. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive. A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message. Hello again, Just one question : what do mlist, msg, msgdata stand for ? As I read I've to create my module and define a process(mlist, msg, msgdata) inside it, I'd like to know what are those objects. I discovered that mlist stands for a Mailman.MailList.MailList('list-name'), but for the others, it's a bit hard to find... Only custom handlers need to define process(mlist, msg, msgdata). That is the entry point to the handler and three objects are passed mlist is the Mailman.MailList.MailList() instance for the current list msg is a Mailman.Message.Message() (subclass of email.Message.Message) instance for the current message msgdata is a dictionary of the message metadata accumulated so far. The important thing is these are passed in as arguments to the handler process() function. In your case, you are defining a module which is going to be invoked like the following. Suppose that PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s' It will be invoked in a pipe similar to cat raw_message | /path/to/myarch.py HOST LIST i.e. the command string with %(hostname)s and %listname)s replaced by the actual host name and list name of the list will be invoked and the message piped to it. So, it could begin something like: #!python import sys sys.path.insert(0, 'path/to/mailman/bin') # The above line can be skipped if myarch.py is in Mailman's # bin directory. import paths import email from Mailman import MailList from Mailman import Message msg = email.message_from_file(sys.stdin, Message.Message) mlist = MailList.MailList(sys.argv[1], lock=True) At this point, you have a list object (locked) and a message object. You might think you could just do mlist.ArchiveMail(msg) to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point mlist.Save() mlist.Unlock() and the message is now in the pipermail archive and can be indexed. -- Mark Sapiro m...@msapiro.net The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan wow, thanks a lot, with all this I'll be able to do what I want! I'll post all my stuff as soon as I've done it, hopefully next week :). Thanks again. Best regards, C. -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy:
Re: [Mailman-Users] Indexing mail right after delivery
On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro m...@msapiro.net wrote: Cedric Jeanneret wrote: I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=pluginsearchQuery.spaceKey=conf_all) Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching. This will be quite doable with Mailman 3 which is still in development. There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at http://wiki.list.org/x/l4A9. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive. A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message. Hello again, Just one question : what do mlist, msg, msgdata stand for ? As I read I've to create my module and define a process(mlist, msg, msgdata) inside it, I'd like to know what are those objects. I discovered that mlist stands for a Mailman.MailList.MailList('list-name'), but for the others, it's a bit hard to find... Thanks in advance. C. -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL signature.asc Description: PGP signature -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
Cedric Jeanneret wrote: Thank you very much for your answer. I guess the cleanest way would be to override the PUBLIC_EXTERNAL_ARCHIVER (we don't want to index our private for now). I'll give it a try as soon as possible. Do you think my script will interest some people ? if so, where should I post it ? Yes, I think it may be of interest. The best place is the tracker at https://bugs.launchpad.net/mailman plus a note to this list. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On 2/26/2010 4:20 AM, Cedric Jeanneret wrote: On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro m...@msapiro.net wrote: Cedric Jeanneret wrote: I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=pluginsearchQuery.spaceKey=conf_all) Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching. This will be quite doable with Mailman 3 which is still in development. There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at http://wiki.list.org/x/l4A9. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive. A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message. Hello again, Just one question : what do mlist, msg, msgdata stand for ? As I read I've to create my module and define a process(mlist, msg, msgdata) inside it, I'd like to know what are those objects. I discovered that mlist stands for a Mailman.MailList.MailList('list-name'), but for the others, it's a bit hard to find... Only custom handlers need to define process(mlist, msg, msgdata). That is the entry point to the handler and three objects are passed mlist is the Mailman.MailList.MailList() instance for the current list msg is a Mailman.Message.Message() (subclass of email.Message.Message) instance for the current message msgdata is a dictionary of the message metadata accumulated so far. The important thing is these are passed in as arguments to the handler process() function. In your case, you are defining a module which is going to be invoked like the following. Suppose that PUBLIC_EXTERNAL_ARCHIVER = '/path/to/myarch.py %(hostname)s %listname)s' It will be invoked in a pipe similar to cat raw_message | /path/to/myarch.py HOST LIST i.e. the command string with %(hostname)s and %listname)s replaced by the actual host name and list name of the list will be invoked and the message piped to it. So, it could begin something like: #!python import sys sys.path.insert(0, 'path/to/mailman/bin') # The above line can be skipped if myarch.py is in Mailman's # bin directory. import paths import email from Mailman import MailList from Mailman import Message msg = email.message_from_file(sys.stdin, Message.Message) mlist = MailList.MailList(sys.argv[1], lock=True) At this point, you have a list object (locked) and a message object. You might think you could just do mlist.ArchiveMail(msg) to archive the mail to the listname.mbox file and the pipermail archive, but that wouldn't quite work because that method would re-invoke the external archiver. Also, you don't need to worry about the listname.mbox file because the ArchiveMail() method already did that before invoking the external archiver, so what you would need is from Mailman.Archiver import HyperArch from cStringIO import StringIO f = StringIO(str(msg)) h = HyperArch.HyperArchive(mlist) h.processUnixMailbox(f) h.close() f.close() Which is what the ArchiveMail() method would do. Now you still have the mlist and msg objects, and you need to save and unlock the list at some point mlist.Save() mlist.Unlock() and the message is now in the pipermail archive and can be indexed. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
Cedric Jeanneret wrote: I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=pluginsearchQuery.spaceKey=conf_all) Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching. This will be quite doable with Mailman 3 which is still in development. There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at http://wiki.list.org/x/l4A9. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive. A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Indexing mail right after delivery
On Thu, 25 Feb 2010 17:08:06 -0800 Mark Sapiro m...@msapiro.net wrote: Cedric Jeanneret wrote: I'm trying to create a xapian[1] indexer for our mailing list. As mailman is written in Python and there are python bindings for xapian, I guess I can maybe create a plugin for that. My first question is : is there already such a thing ? I searched on the net, but nothing appeared My second one : can we create a plugin for mailman, if so, where should I go to have some doc ? seems there's nothing in the wiki (http://wiki.list.org/dosearchsite.action?searchQuery.queryString=pluginsearchQuery.spaceKey=conf_all) Just to explain why I'd like to do that: we already have a xapian search engine in here, indexing a fileserver, request tracker queues and moinmoin wikis... so we'd like to aggregate all our stuff in one app for searching. This will be quite doable with Mailman 3 which is still in development. There are problems trying to do this in Mailman 2.1.x. There is a plugin capability of sorts in the form of custom handlers that can be added to the incoming message processing pipeline. See the FAQ at http://wiki.list.org/x/l4A9. However, archiving is asynchronous with incoming message processing, so it is not possible for a custom handler to know the URL that will ultimately retrieve the message from the archive. A different approach which might be workable is to use the PUBLIC_EXTERNAL_ARCHIVER and PRIVATE_EXTERNAL_ARCHIVER hooks. If you set PUBLIC_EXTERNAL_ARCHIVER = '/path/to/script.py' PRIVATE_EXTERNAL_ARCHIVER = '/path/to/script.py' in mm_cfg.py, then that script will be invoked do do the archiving. The script in turn could invoke the standard pipermail archiving process and then invoke xapian to index the archived message. Hello Mark, Thank you very much for your answer. I guess the cleanest way would be to override the PUBLIC_EXTERNAL_ARCHIVER (we don't want to index our private for now). I'll give it a try as soon as possible. Do you think my script will interest some people ? if so, where should I post it ? Thanks again Best regards, C. -- Cédric Jeanneret | System Administrator 021 619 10 32| Camptocamp SA cedric.jeanne...@camptocamp.com | PSE-A / EPFL -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org