Re: A script for fixing bare newlines in mailbox files?

2007-01-12 Thread Joseph Brennan



--On Friday, January 12, 2007 12:18 -0500 Jorey Bump <[EMAIL PROTECTED]> 
wrote:




Did any users report any further corruption of what is arguably already a
corrupted message? I'm not familiar with the cause of this problem, but
having encountered it before, mainly with messages that have large
attachments, I'm wondering if attached files might be unusable after such
a scrubbing (assuming they were not encoded properly).





No.  For a few years we have been refusing or rewriting messages
with nulls or bare returns, so only mail at least 4 years old was
involved.  Many of the messages involved were 10 years old or more.

Amateurish Windows-based mail-sending software is still in use that
sends junk like this.  From the lack of trouble reports, I think it
is text parts that are mainly affected.  Maybe to do encoding the
software writers use standard modules that do it right.

Joseph Brennan
Lead Email Systems Engineer
Columbia University Information Technology




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: A script for fixing bare newlines in mailbox files?

2007-01-12 Thread Jorey Bump

Joseph Brennan wrote:


When moving from U Wash to Cyrus we applied this rewrite to all
mailboxes.  Get rid of any nulls while you're at it.


while(<>) {

   # The \000 character (NUL) is not allowed
   if ($line =~ s/\000//g) {
  print STDERR "WARNING: Removing NUL\n";
   }

   # Change CRLF or bare CR to LF
   $endcr = $midcr = 0;
   $endcr++ if ($line =~ s/\015$//g); # \n already there
   $midcr++ if ($line =~ s/\015/\n/g); # add \n
   if ($endcr || $midcr) {
  print STDERR "WARNING: Correcting CR characters\n";
}


   print;
}


Did any users report any further corruption of what is arguably already 
a corrupted message? I'm not familiar with the cause of this problem, 
but having encountered it before, mainly with messages that have large 
attachments, I'm wondering if attached files might be unusable after 
such a scrubbing (assuming they were not encoded properly).



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: A script for fixing bare newlines in mailbox files?

2007-01-12 Thread Joseph Brennan



--On Thursday, January 11, 2007 17:35 -0500 Zachariah Mully 
<[EMAIL PROTECTED]> wrote:



Howdy all-
We've been bitten by migrating some of our people from Outlook to
Thunderbird, and then using Tbird to move their mail off their local
machines onto the IMAP server where it belongs. Unfortunately we've not
patched Cyrus to accept bare newlines, nor intend to... Since I have
access to the local mailboxes does anybody have a perl script or
something of the like that would remove the bare newlines from the raw
mailbox files? My perl-fu sucks this days, and I've not been able to
figure where and how to remove them...



When moving from U Wash to Cyrus we applied this rewrite to all
mailboxes.  Get rid of any nulls while you're at it.


while(<>) {

   # The \000 character (NUL) is not allowed
   if ($line =~ s/\000//g) {
  print STDERR "WARNING: Removing NUL\n";
   }

   # Change CRLF or bare CR to LF
   $endcr = $midcr = 0;
   $endcr++ if ($line =~ s/\015$//g); # \n already there
   $midcr++ if ($line =~ s/\015/\n/g); # add \n
   if ($endcr || $midcr) {
  print STDERR "WARNING: Correcting CR characters\n";
}


   print;
}


Joseph Brennan
Lead Email Systems Engineer
Columbia University Information Technology


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: A script for fixing bare newlines in mailbox files?

2007-01-11 Thread Zachariah Mully

Andreas Winkelmann wrote:


What do you mean with "bare newlines" in mailboxes.db?

What does an Export as Textfile show?

(as cyrus)
$ ctl_mboxlist -d


Sorry, I meant the raw mailboxes used by Tbird on their own machines. 
Because they were originally using Outlook, many people inadvertently 
saved their mail on their local machines not on the server. We've moved 
them to Tbird which does a good job of importing the local mail from 
Outlook, but now we can't move the mailboxes to the server because they 
contain barenewlines. So short of patching Cyrus to accept these 
messages (which I'll only do as a last resort), I was wondering if 
anybody had a nice little piece of perl that I could let loose on these 
mailboxes to scrub the bare newlines from them.


Thanks,
Z


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: A script for fixing bare newlines in mailbox files?

2007-01-11 Thread Andreas Winkelmann
On Thursday 11 January 2007 23:35, Zachariah Mully wrote:

>   We've been bitten by migrating some of our people from Outlook to
> Thunderbird, and then using Tbird to move their mail off their local
> machines onto the IMAP server where it belongs. Unfortunately we've not
> patched Cyrus to accept bare newlines, nor intend to... Since I have
> access to the local mailboxes does anybody have a perl script or
> something of the like that would remove the bare newlines from the raw
> mailbox files? My perl-fu sucks this days, and I've not been able to
> figure where and how to remove them...

What do you mean with "bare newlines" in mailboxes.db?

What does an Export as Textfile show?

(as cyrus)
$ ctl_mboxlist -d

-- 
Andreas

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


A script for fixing bare newlines in mailbox files?

2007-01-11 Thread Zachariah Mully
Howdy all-
We've been bitten by migrating some of our people from Outlook to
Thunderbird, and then using Tbird to move their mail off their local
machines onto the IMAP server where it belongs. Unfortunately we've not
patched Cyrus to accept bare newlines, nor intend to... Since I have
access to the local mailboxes does anybody have a perl script or
something of the like that would remove the bare newlines from the raw
mailbox files? My perl-fu sucks this days, and I've not been able to
figure where and how to remove them...

Thanks,
Z


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html