Rob Tanner wrote:
>
>I am trying to rebuild archives -- actually porting archives over from 
>another machine and then doing a rebuild, but the problem below shows up 
>inb all the archives.
>
>I run the command "bin/arch small_centers" and get the following error:
>
>#00000 <[EMAIL PROTECTED]>
>figuring article archives
>2008-February
>Pickling archive state into 
>/var/lib/mailman/archives/private/small_centers/pipermail.pck
>Traceback (most recent call last):
>  File "bin/arch", line 200, in <module>
>    main()
>  File "bin/arch", line 188, in main
>    archiver.processUnixMailbox(fp, start, end)
>  File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 580, in 
>processUnixMailbox
>    self.add_article(a)
>  File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 624, in 
>add_article
>    author = fixAuthor(article.decoded['author'])
>  File "/usr/lib/mailman/Mailman/Archiver/pipermail.py", line 62, in 
>fixAuthor
>    while i>0 and (L[i-1][0] in lowercase or
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 26: 
>ordinal not in range(128)
>
>This actually looks like a problem in a specific email message in the 
>archive.  How do I identify the mesand how do I fix it?


If you're doing a 'rebuild', you probably want the --wipe option with
bin/arch, but that isn't the problem.

The message has a hex b5 (Greek mu, micro sign), I think in the From:
header.

You can try the attached patch to cleanarch which should enable
cleanarch to find the problems. It won't "fix" them, but it will tell
you where they are.

-- 
Mark Sapiro <[EMAIL PROTECTED]>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan

--- test-mailman-2.1/bin/cleanarch      2007-06-18 08:35:57.000000000 -0700
+++ bin/cleanarch       2008-02-06 14:39:35.859375000 -0800
@@ -117,6 +117,8 @@
     statuscnt = 0
     messages = 0
     prevline = None
+    inheaders = False
+    badre = re.compile(r'[\177-\377]')
     while True:
         lineno += 1
         line = sys.stdin.readline()
@@ -144,6 +146,7 @@
                 else:
                     # It's a valid Unix-From line
                     messages += 1
+                    inheaders = True
                     if output:
                         # Before we spit out the From_ line, make sure the
                         # previous line was blank.
@@ -154,9 +157,15 @@
             else:
                 # This is a bogus Unix-From line
                 escape_line(line, lineno, quiet, output)
-        elif output:
+        else:
             # Any old line
-            sys.stdout.write(line)
+            if len(line.strip('\r\n')) == 0:
+                inheaders = False
+            if inheaders:
+                if badre.search(line):
+                    print >> sys.stderr, 'Non-ascii in %s\nat line number %d' 
% (line, lineno)
+            if output:
+                sys.stdout.write(line)
         if status > 0 and (lineno % status) == 0:
             sys.stderr.write('#')
             statuscnt += 1
------------------------------------------------------
Mailman-Users mailing list
[email protected]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&amp;file=faq01.027.htp

Reply via email to