Re: [Mailman-Users] archiving partial duplicates
Con Wieland wrote: When I checked the archives for the current month a number of partial messages showed up with no from line and no date. snip Any idea's on what to look for would be appreciated. Look for messages containing unescaped From lines in the body. The bin/cleanarch script can help with fixing a .mbox file that contains these. -- Mark Sapiro [EMAIL PROTECTED] The highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
Re: [Mailman-Users] archiving partial duplicates
On 4/6/06, Dragon [EMAIL PROTECTED] wrote: So if you decided 500 was a good number to index, for the first chunk you would do: bin/arch --wipe --start=1 --end=500 listname For subsequent chunks you would do (adjusting the start and end indexes of course...): bin/arch --start=501 --end=1000 listname Bash can do this for you: #!/bin/bash bin/arch --wipe for i in `seq 0 10` do bin/arch --start=$(( 1+$i*500 )) --end=$(( 500+$i*500 )) listname done ...Assuming that bin/arch --wipe alone does what I think it does. Also, if you have more than 5500 messages in the mbox file, you'll need to adjust the second argument in the 'seq' upwards. And, of course, replace 'listname' with the list name. -- - Patrick Bogen -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
Re: [Mailman-Users] archiving partial duplicates
Patrick Bogen sent the message below at 08:18 4/7/2006: On 4/6/06, Dragon [EMAIL PROTECTED] wrote: So if you decided 500 was a good number to index, for the first chunk you would do: bin/arch --wipe --start=1 --end=500 listname For subsequent chunks you would do (adjusting the start and end indexes of course...): bin/arch --start=501 --end=1000 listname Bash can do this for you: #!/bin/bash bin/arch --wipe for i in `seq 0 10` do bin/arch --start=$(( 1+$i*500 )) --end=$(( 500+$i*500 )) listname done ...Assuming that bin/arch --wipe alone does what I think it does. Also, if you have more than 5500 messages in the mbox file, you'll need to adjust the second argument in the 'seq' upwards. And, of course, replace 'listname' with the list name. End original message. - Or Perl or Python or whatever your favorite scripting language might be... But that's just icing on the cake and not really a necessity. The --wipe argument deletes all the old files and builds new ones. Which is great but ii did have one unfortunate consequence for me. I am using htdig to allow search of my archives and when I rebuilt the archives after editing the templates and installing htdig, all of the file dates on the messages were set to the date I rebuilt the archive. This destroyed the date context for the archive and the file dates displayed by the htdig search were not reflecting when the message was originally posted. So, I wrote a small Perl script to fix the file dates to match the message date in each of the message files. If anyone is interested in that script, I would be happy to share it with you. Just e-mail me directly and I will send it to you. Dragon ~~~ Venimus, Saltavimus, Bibimus (et naribus canium capti sumus) ~~~ -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
[Mailman-Users] archiving partial duplicates
Hello, I'm having trouble with one of my archives. This archive is about 5 years old and is fairly large. The problem is I recently had to remove a message and reindex. When I checked the archives for the current month a number of partial messages showed up with no from line and no date. These actually turned out to be chunks from some very old messages. I am confused though because the original message shows up fine and the current month's archive shows an incomplete portion of the old message. I'm assuming it's a corrupt message but I don't understand why it's showing up in both places. I have a couple dozen of these. I don't know if this existed before my last arch rebuild. I have gone through several of these and am not able to see anything unusual. I have also checked the messages preceding and after the messages in question to no avail. Any idea's on what to look for would be appreciated. Con Wieland Network and Academic Computing Services -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
Re: [Mailman-Users] archiving partial duplicates
Con Wieland sent the message below at 11:50 4/6/2006: Hello, I'm having trouble with one of my archives. This archive is about 5 years old and is fairly large. The problem is I recently had to remove a message and reindex. When I checked the archives for the current month a number of partial messages showed up with no from line and no date. These actually turned out to be chunks from some very old messages. I am confused though because the original message shows up fine and the current month's archive shows an incomplete portion of the old message. I'm assuming it's a corrupt message but I don't understand why it's showing up in both places. I have a couple dozen of these. I don't know if this existed before my last arch rebuild. I have gone through several of these and am not able to see anything unusual. I have also checked the messages preceding and after the messages in question to no avail. Any idea's on what to look for would be appreciated. End original message. - I am just kinda guessing here but there is a caveat somewhere in the documentation about pipermail or the arch script that it has trouble digesting large MBOX files at a single gulp. It may be that this is the cause of the problem, it may not. It is also stated somewhere in the documentation for either pipermail or the arch script that you can specify a start and end index for a chunk of messages in the MBOX file. What you could then do is rebuild the archive in chunks, say 500 posts at a time to see what happens. So if you decided 500 was a good number to index, for the first chunk you would do: bin/arch --wipe --start=1 --end=500 listname For subsequent chunks you would do (adjusting the start and end indexes of course...): bin/arch --start=501 --end=1000 listname Hope that helps. Dragon ~~~ Venimus, Saltavimus, Bibimus (et naribus canium capti sumus) ~~~ -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp