On Mon Nov 09, 2009 at 20:13:44 +0000, Matt Sergeant wrote: > Yeah there's plenty of ways to do it. Textcat being one of them. Google > also has an API for language detection which is pretty good. I just > wondered if anyone had anything pre-rolled for qpsmtpd.
I experimented with this in the past, but found that it wasn't such a useful test. I tried using the perl modle Lingua::Identify to statistically evaluate the message body and also using character set detection. Both cases resulted in too many false-positives to be useful. (Using charset encodings was particularly prone to failure due to Russian people being on English mailing lists and having badly configured mail clients, and similar issues with other non-English people.) Attached is my simple Subject: encoding plugin - it'd need cleanup to be general purpose, but its almost there. Steve -- Debian GNU/Linux System Administration http://www.debian-administration.org/
#!/usr/bin/perl -w # # Detect if the message "looks foreign". # # This means looking for non-English subject encodings. # # Steve # -- # use lib "/mf/lib/"; use strict; use warnings; use Qpsmtpd::Constants; sub hook_data_post { my ( $self, $transaction ) = @_; # # If the mail is rejected terminate quickly # if ( ( $transaction->notes("reject") || 0 ) == 1 ) { $self->log( LOGWARN, $self->plugin_name() . ": terminating as mail is already rejected" ); return DECLINED; } # # Get the domain this mail is for - this domain is setup # in the 'test_recipient' plugin # my $domain = $transaction->notes("domain") || undef; return DECLINED unless ( defined($domain) ); # # If this check isn't enabled then return immediately # if ( !( ( -e "/srv/$domain/checks/language" ) || ( -e "/srv/$domain/checks/all" ) ) ) { $self->log( LOGWARN, $self->plugin_name() . ": plugin disabled for domain $domain" ); return DECLINED; } ## ## Get the subject ## my $subject = $transaction->header->get('Subject') || ""; # # If it doesn't match the regexp it is either English, or we # missed something. # # Either way we'll allow it. # if ( $subject !~ /((re|aw|res|fw):?\s?)*=\?(GB2312|HZ-GB-2312|BIG5|euc_kr|euc-kr|iso-2022-kr|koi8-r|windows-125[145]|windows-1256|ks_c_5601-1987|iso-2022-jp)\?(B|Q)\?/io ) { # # If it is English then we're fine. # return DECLINED; } # # OK we matched the regexp. Reject the mail. # $transaction->notes( "reject", 1 ); $transaction->notes( "blocker", $self->plugin_name() ); $transaction->notes( "reason", "English mail only please" ); return DECLINED; }