Heya, We had some problems with OTRS 2.0.4, which simply lost emails which were supposed to end up in one of our queues. I was able to track this problem down to the PostgreSQL DB rejecting the body due to a broken Unicode sequence. This was caused by the emails, which were marked as being UTF8, but contained a disclaimer attached by the customer's MTA - in latin1.
After removing the QP encoding, OTRS noticed that the target charset for the DB (utf8) and the source charset were the same and skipped re-encoding with Perl's Encode module. This sounds like a sane optimization, but the decode/re-encode step has the nice side-effect of validating all byte sequences and replacing broken data by some valid character. I removed the optimization, mails are now processed without a problem. I have attached two patch: The first applies to 2.0.4 and has been tested, while the second applies to 2.2.5, but hasn't been tested. Please apply it, or at least add a configure option enforcing re-encoding of mails. Thanks, Marc
--- Kernel/System/Encode.pm 2008-03-11 14:14:03.000000000 +0100
+++ Kernel/System/Encode.pm 2008-03-11 14:19:44.000000000 +0100
@@ -179,33 +179,29 @@
if (!$Self->{CharsetEncodeSupported}) {
return $Param{Text};
}
- # if no encode is needed
- if ($Param{From} =~ /^$Param{To}$/i) {
- if ($Param{To} =~ /^utf(-8|8)/i) {
- Encode::_utf8_on($Param{Text});
- }
- return $Param{Text};
+ # always decode/encode, as some picky DB backends (like Postgres)
+ # fail horribly when trying to insert broken sequences. When recoding,
+ # Encode.pm does the right thing [tm] and replaces broken byte sequences
+ # by some safe character (usually '?'). Broken sequences usually happen
+ # when user mail has encoding A and some mail gateway appends some
+ # footer/disclaimer in encoding B (without doing the needed MIME magic)
+ if ($Param{Force}) {
+ Encode::_utf8_off($Param{Text});
+ }
+ if (! eval { Encode::from_to($Param{Text}, $Param{From}, $Param{To}) } ) {
+ print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
}
- # encode is needed
else {
- if ($Param{Force}) {
- Encode::_utf8_off($Param{Text});
- }
- if (! eval { Encode::from_to($Param{Text}, $Param{From}, $Param{To}) } ) {
- print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
- }
- else {
- # set utf-8 flag
- if ($Param{To} =~ /^utf(8|-8)$/i) {
- Encode::encode_utf8($Param{Text});
- Encode::_utf8_on($Param{Text});
- }
- if ($Self->{Debug}) {
- print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
- }
- }
- return $Param{Text};
+ # set utf-8 flag
+ if ($Param{To} =~ /^utf(8|-8)$/i) {
+ Encode::encode_utf8($Param{Text});
+ Encode::_utf8_on($Param{Text});
+ }
+ if ($Self->{Debug}) {
+ print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
+ }
}
+ return $Param{Text};
}
=item SetIO()
--- Kernel/System/Encode.pm 2008-03-11 16:18:01.000000000 +0100
+++ Kernel/System/Encode.pm 2008-03-11 16:19:31.000000000 +0100
@@ -195,37 +195,33 @@
return $Param{Text};
}
- # if no encode is needed
- if ( $Param{From} =~ /^$Param{To}$/i ) {
- if ( $Param{To} =~ /^utf(-8|8)/i ) {
- Encode::_utf8_on( $Param{Text} );
- }
- return $Param{Text};
- }
- # encode is needed
+ # always decode/encode, as some picky DB backends (like Postgres)
+ # fail horribly when trying to insert broken sequences. When recoding,
+ # Encode.pm does the right thing [tm] and replaces broken byte sequences
+ # by some safe character (usually '?'). Broken sequences usually happen
+ # when user mail has encoding A and some mail gateway appends some
+ # footer/disclaimer in encoding B (without doing the needed MIME magic)
+ if ( $Param{Force} ) {
+ Encode::_utf8_off( $Param{Text} );
+ }
+ if ( !eval { Encode::from_to( $Param{Text}, $Param{From}, $Param{To} ) } ) {
+ print STDERR
+ "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
+ }
else {
- if ( $Param{Force} ) {
- Encode::_utf8_off( $Param{Text} );
- }
- if ( !eval { Encode::from_to( $Param{Text}, $Param{From}, $Param{To} ) } ) {
- print STDERR
- "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text}) not supported!\n";
- }
- else {
- # set utf-8 flag
- if ( $Param{To} =~ /^utf(8|-8)$/i ) {
+ # set utf-8 flag
+ if ( $Param{To} =~ /^utf(8|-8)$/i ) {
- # Encode::encode_utf8($Param{Text});
- Encode::_utf8_on( $Param{Text} );
- }
- if ( $Self->{Debug} ) {
- print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
- }
+ # Encode::encode_utf8($Param{Text});
+ Encode::_utf8_on( $Param{Text} );
+ }
+ if ( $Self->{Debug} ) {
+ print STDERR "Charset encode '$Param{From}' -=> '$Param{To}' ($Param{Text})!\n";
}
- return $Param{Text};
}
+ return $Param{Text};
}
=item SetIO()
pgpJhpac1PBTC.pgp
Description: PGP signature
_______________________________________________ OTRS mailing list: dev - Webpage: http://otrs.org/ Archive: http://lists.otrs.org/pipermail/dev To unsubscribe: http://lists.otrs.org/cgi-bin/listinfo/dev
