Re: [Mailman-Users] archiving partial duplicates

2006-04-09 Thread Mark Sapiro
Con Wieland wrote:

When I checked the archives for the  
current month a number of partial messages showed up with no from  
line and no date.
snip
Any idea's on what to look for would be appreciated.


Look for messages containing unescaped From  lines in the body.

The bin/cleanarch script can help with fixing a .mbox file that
contains these.

-- 
Mark Sapiro [EMAIL PROTECTED]   The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] archiving partial duplicates

2006-04-07 Thread Patrick Bogen
On 4/6/06, Dragon [EMAIL PROTECTED] wrote:
 So if you decided 500 was a good number to index, for the first chunk
 you would do:

  bin/arch --wipe --start=1 --end=500 listname

 For subsequent chunks you would do (adjusting the start and end
 indexes of course...):

  bin/arch --start=501 --end=1000 listname


Bash can do this for you:

#!/bin/bash
bin/arch --wipe
for i in `seq 0 10`
do
bin/arch --start=$(( 1+$i*500 )) --end=$(( 500+$i*500 )) listname
done

...Assuming that bin/arch --wipe alone does what I think it does.
Also, if you have more than 5500 messages in the mbox file, you'll
need to adjust the second argument in the 'seq' upwards. And, of
course, replace 'listname' with the list name.

--
- Patrick Bogen
--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] archiving partial duplicates

2006-04-07 Thread Dragon
Patrick Bogen sent the message below at 08:18 4/7/2006:
On 4/6/06, Dragon [EMAIL PROTECTED] wrote:
  So if you decided 500 was a good number to index, for the first chunk
  you would do:
 
   bin/arch --wipe --start=1 --end=500 listname
 
  For subsequent chunks you would do (adjusting the start and end
  indexes of course...):
 
   bin/arch --start=501 --end=1000 listname
 

Bash can do this for you:

#!/bin/bash
bin/arch --wipe
for i in `seq 0 10`
do
 bin/arch --start=$(( 1+$i*500 )) --end=$(( 500+$i*500 )) listname
done

...Assuming that bin/arch --wipe alone does what I think it does.
Also, if you have more than 5500 messages in the mbox file, you'll
need to adjust the second argument in the 'seq' upwards. And, of
course, replace 'listname' with the list name.
 End original message. -

Or Perl or Python or whatever your favorite scripting language might be...

But that's just icing on the cake and not really a necessity.

The --wipe argument deletes all the old files and builds new ones. 
Which is great but ii did have one unfortunate consequence for me. I 
am using htdig to allow search of my archives and when I rebuilt the 
archives after editing the templates and installing htdig, all of the 
file dates on the messages were set to the date I rebuilt the 
archive. This destroyed the date context for the archive and the file 
dates displayed by the htdig search were not reflecting when the 
message was originally posted.

So, I wrote a small Perl script to fix the file dates to match the 
message date in each of the message files. If anyone is interested in 
that script, I would be happy to share it with you. Just e-mail me 
directly and I will send it to you.

Dragon

~~~
  Venimus, Saltavimus, Bibimus (et naribus canium capti sumus)
~~~

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


[Mailman-Users] archiving partial duplicates

2006-04-06 Thread Con Wieland
Hello,

I'm having trouble with one of my archives. This archive is about 5  
years old and is fairly large. The problem is I recently had to  
remove a message and reindex. When I checked the archives for the  
current month a number of partial messages showed up with no from  
line and no date. These actually turned out to be chunks from some  
very old messages. I am confused though because the original message  
shows up fine and the current month's archive shows an incomplete  
portion of the old message. I'm assuming it's a corrupt message but I  
don't understand why it's showing up in both places. I have a couple  
dozen of these. I don't know if this existed before my last arch  
rebuild.  I have gone through several of these and am not able to see  
anything unusual. I have also checked the messages preceding and  
after the messages in question to no avail.

Any idea's on what to look for would be appreciated.

Con Wieland
Network and Academic Computing Services

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] archiving partial duplicates

2006-04-06 Thread Dragon
Con Wieland sent the message below at 11:50 4/6/2006:
Hello,

I'm having trouble with one of my archives. This archive is about 5
years old and is fairly large. The problem is I recently had to
remove a message and reindex. When I checked the archives for the
current month a number of partial messages showed up with no from
line and no date. These actually turned out to be chunks from some
very old messages. I am confused though because the original message
shows up fine and the current month's archive shows an incomplete
portion of the old message. I'm assuming it's a corrupt message but I
don't understand why it's showing up in both places. I have a couple
dozen of these. I don't know if this existed before my last arch
rebuild.  I have gone through several of these and am not able to see
anything unusual. I have also checked the messages preceding and
after the messages in question to no avail.

Any idea's on what to look for would be appreciated.
 End original message. -

I am just kinda guessing here but there is a caveat somewhere in the 
documentation about pipermail or the arch script that it has trouble 
digesting large MBOX files at a single gulp. It may be that this is 
the cause of the problem, it may not.

It is also stated somewhere in the documentation for either pipermail 
or the arch script that you can specify a start and end index for a 
chunk of messages in the MBOX file. What you could then do is rebuild 
the archive in chunks, say 500 posts at a time to see what happens.

So if you decided 500 was a good number to index, for the first chunk 
you would do:

 bin/arch --wipe --start=1 --end=500 listname

For subsequent chunks you would do (adjusting the start and end 
indexes of course...):

 bin/arch --start=501 --end=1000 listname

Hope that helps.

Dragon

~~~
  Venimus, Saltavimus, Bibimus (et naribus canium capti sumus)
~~~

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp