Hi, folks I think I have found a bug in bin/arch, but I imagine someone has found it before. Also, I have run into an architectural limit and would like to change a constant to fix it, if possible.
*** Part One: Architectural Limit I am trying to import large numbers of messages (several hundred, to as many as 900 for one of my lists) into Mailman from mbox files. I can only do about 80 at a time with the "arch" program. I have written an awk script which breaks my large mbox files into 80-message chunks, then I use a "for" loop from the bash shell to process all of these. So far the best result has come from starting with the earliest chunk, then using "cat" to append the next chunk to that and re-running "arch", and so on until all the chunks are added. The command sequence I'm using is: for a in <mylist>-split-*.mbox; do cat $a >> archives/private/<mylist>.mbox/<mylist>.mbox bin/arch <mylist> cron/nightly_gzip <mylist> done This works, though it is cumbersome and slow. I'm not a Python guru but do know C. I'm running on a machine with *lots* of RAM. Can someone point me to which module I need to edit so that I can increase the constant for an array size somewhere such that this thing will handle more than 80 messages? I figure if I know where to look I can puzzle out the code enough to make a simple change like that. But it will take days to find it in code of an unfamiliar language with many modules. The FAQ, by the way, does not (and should) address this problem. It only instructs one to replace the archives/private/<mylist>.mbox/<mylist>.mbox file, then run bin/arch. That only works for small numbers of messages. I'll make this proposal: Someone point me in the right direction to solve this thing. When I've got it working, I'll write an entry to submit for the FAQ to document this for the next person, as a way of contributing to the Mailman user community. *** Part Two: Possible Bug Sometimes bin/arch reaches a certain point and just....stops. No errors, but no further messages are added. The "raw text" version of the archives grows in size, though, as the same messages get added again and again after this sticking point (before that, everything is well). I did some investigation and have found that the problem occurs when a normal text line in the body of a message happens to begin with the string "From ". It appears bin/arch is not being very smart about recognizing the beginning of a new message. I think what is needed is to add a more detailed parsing regexp to the code that determines where one message ends and another begins. I've worked around this on my own system by using some manual greps to find the problem strings, then fixing them by using vi to insert a blank ahead of the word "From" on these lines. It's cumbersome, but so far my success rate is 100%. Anyone have a better idea? As with the other problem, I'm willing to document this for the FAQ if in fact it is a FAQ (pun intended). Kind regards, Scott -- -----------------------+------------------------------------------------------ Scott Courtney | "I don't mind Microsoft making money. I mind them [EMAIL PROTECTED] | having a bad operating system." -- Linus Torvalds http://www.4th.com/ | ("The Rebel Code," NY Times, 21 February 1999) ------------------------------------------------------ Mailman-Users mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py