Re: [Boston.pm] Email filtering...
On Tue, Feb 11, 2003 at 09:42:59AM -0500, Wizard wrote: I'm writing a script to do email filtering, and I have some questions: 1. These are all valid email address: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - this is valid, but not standard What's non-standard about it? (apart from the spelling) The problem is that last one with DOMAIN.CC format. Not having a TLD is NOT TYPICAL for UK domains (there are only 7?), but are there other non-US countries using a similar format? Indeed it's not typical, but it is standard in that it has all the right records in all the right places like the DNS*. And yes, PLENTY of other countries have longword.cc-style domains. France, for instance, and Germany, and Canada. * - I assume, I haven't bothered looking it up This is what I am expecting for international emails, and others will presently fail: ^ it always makes me laugh when people misuse this word to mean not in my country. USER_NAME[_W_DOTS*]@CNAMES*.DOMAIN.TLD.CC[:PORT?], an extreme being like this: [EMAIL PROTECTED]:8021, where 'TLD' is required if 'CC' is present. Is this a problem? Yes, it's a problem. It won't work. 2. Do email addresses ever have port numbers appended, like this: [EMAIL PROTECTED]:24 If they do, then they're invalid AFAIK. 3. Are there any US domains that don't look like this: CNAMES*.DOMAIN.TLD[:PORT?] Yes, anything in .us. In general, trying to divine what is and is not a valid mail domain without looking it up in the DNS is doomed to failure. -- David Cantrell | Sysadmin/programmer for hire | [EMAIL PROTECTED] One person can change the world, but most of the time they shouldn't -- Marge Simpson ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, 2003-02-11 at 15:49, darren chamberlain wrote: * Simon Wilcox essuu at ourshack.com [2003-02-11 10:47]: We can actually use CPAN modules (as long as they are pure perl) but we need to distribute them as part of the code. Well, then, bundle in Email::Valid, because it's wonderful, and is (more or less) the definitive way to do what it does. We should certainly be looking at it. No point reinventing wheels etc etc :) Over to Grant, he's the lead on this particular part of the NMS project. Simon. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, 11 Feb 2003, darren chamberlain wrote: *@aol.* *@*.parliment.uk fred@*.sourceforge.* etc. Hmm, nice: # Assume @addrs is the above list: for (my $i = 0; $i @addrs; $i++) { $addrs[$i] =~ s/\*/.*/g; $addrs[$i] =~ s/\./\\./g; $addrs[$i] = qr($addrs[$i]); } You've got a list of regexes. Cool. Except for one thing: you'll want to swap those first two statements in the for loop -- otherwise you'll end up escaping your regex '.'s that replaced the wildcard '*'s... -- Steve Reppucci [EMAIL PROTECTED] | Logical Choice Software http://logsoft.com/ | =-=-=-=-=-=-=-=-=-=- My God! What have I done? -=-=-=-=-=-=-=-=-=-= ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] Email filtering...
Did you volunteer for this? Did anyone laugh when you spoke up? Just in my head. They're always laughing at me. What? I couldn't KILL HER... Grant M. ( Co.) ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, Feb 11, 2003 at 11:12:28AM -0500, darren chamberlain wrote: * Stephen Reppucci sgr at logsoft.com [2003-02-11 11:00]: Except for one thing: you'll want to swap those first two statements in the for loop -- otherwise you'll end up escaping your regex '.'s that replaced the wildcard '*'s... Damn. I did that in my test code, too, but pasted the wrong one in. Good catch. Something like this, my @local_parts = sort { ($a eq '*' ? 2 : $a =~ /\*/ ? 1 : 0) = ($b eq '*' ? 2 : $b =~ /\*/ ? 1 : 0) } # your source That puts '*' last, something with * next, and anything else first. Add map qr/^$_$/ to taste. P -- Paul Makepeace ... http://paulm.com/ If (x == 1), then that must make me a goddess. -- http://paulm.com/toys/surrealism/ ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, Feb 11, 2003 at 11:10:51AM -0500, Wizard wrote: Here's an example that I just sent to the NMS list: I want it to be universal. For instance, how would you parse this: *@*.aol.* into: [EMAIL PROTECTED] [EMAIL PROTECTED] and [EMAIL PROTECTED] It should match the first two, but not the last, because the DOMAIN in the last is 'parliment', not 'aol'. I don't see how '*@*.aol.*' can match '[EMAIL PROTECTED]'. How do you account for the first '.' in the match expression? -Gyepi ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tuesday, February 11, 2003, at 12:23 PM, Gyepi SAM wrote: I don't see how '*@*.aol.*' can match '[EMAIL PROTECTED]'. How do you account for the first '.' in the match expression? For that matter, can a regular expression validly begin with * at all? What does that mean? And why would you want to match a string of zero or more @ characters? ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] damian here in sept. 2003
On Tue, 11 Feb 2003 [EMAIL PROTECTED] wrote: if this is the menu: http://www.yetanother.org/damian/seminars/ then I'll take 1 Perl6 2 Sky 3 Multimethods 4 Everyday 5 Extreme in that order. Not that we're in a hurry, but I might as well start stuffing the ballot: 1 Perligata 2 Perligata 3 Perligata 4 Perligata 5 Perligata :) -- Chris Devers[EMAIL PROTECTED] stability, n. [Latin stabulum a pothouse, haunt, brothel.] 1 A nirvana-type situation that calls for drinks and layoffs all round. 2 The period between crashes. -- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995 ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] Email filtering...
Sorry I took so long to get back, I was at an interview I don't see how '*@*.aol.*' can match '[EMAIL PROTECTED]'. How do you account for the first '.' in the match expression? For that matter, can a regular expression validly begin with * at all? What does that mean? And why would you want to match a string of zero or more @ characters? Apparently, I'm not really explaining myself well. The match criteria is external, in a user-defined config file. like this: BANNED_USER = *@aol.com, *@*.sourceforge.net, *@*.microsoft.*, etc. The cfg entries are then split into @user, @cnames, $domain, $tld, and $cc if pertinent. THEN they are s/\*/\.\*/g. so you end up with matches like this (PSEUDOCODE): $user =~ /^$cfg_user$/ and if $user is [EMAIL PROTECTED] then I never get to checking the cnames, because the .com and the aol match first. Even if they didn't, I would not check the cnames because they aren't defined in the user email. I hope that clarifies things better, Grant M. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, Feb 11, 2003 at 06:27:15PM -0500, Uri Guttman wrote: BR == Bob Rogers [EMAIL PROTECTED] writes: BRI would like to point out that your code can be improved by replacing BRuses of $ and $` with parentheses in the regexes followed by $1 and BR$2. This is from the Devel::SawAmpersand doc . . . BR I was unaware of this issue; thanks for bringing it to my attention. BR But Devel::SawAmpersand doesn't really explain the problem in any kind BR of depth, and just talks about massive in-memory copying. So, BR presumably, this is just a question of efficiency? the problem only happens with s///. if you use $, then all uses of s/// must do a full copy of the original string in case parts of it are referred to by $ (and friends). the s/// could change the string and so a copy must be made. this is true for all instances of s/// in your program. if you use parens then only those instances will need extra copies. Just to clarify, there is a problem with any code that does anything to any of the three variables, if that code is intended for use in programs that may use regexes heavily. Since a module or library should not assume things about what kind of programs will use it, essentially all code should avoid them. In one-liners without performance concerns, they are not a problem. More details are in `perldoc perlvar` and the perl5-porters mailing list archives. -- John Tobey [EMAIL PROTECTED] \^-^ /\ /\ ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, Feb 11, 2003 at 03:16:32PM -0500, Bob Rogers wrote: Also, I've attached a code fragment I wrote to validate email address syntax. Thanks very much for the code sample and link to Bernstein's great RFC 822 info site. I would like to point out that your code can be improved by replacing uses of $ and $` with parentheses in the regexes followed by $1 and $2. This is from the Devel::SawAmpersand doc: * never use $ and friends in a library. * Don't use English in a library, because it contains the three bad fellows. Corollary: if you really want to use English, do it like so: use English qw( -no_match_vars ) ; * before you release a module or program, check if sawampersand is set by any of the modules you use or require. Fortunately perl offers easy to use alternatives, that is instead of this you can use this $` of /pattern/ $1 of /(.*?)pattern/s $ of /pattern/ $1 of /(pattern)/ $' of /pattern/ $+ of /pattern(.*)/s In general, apply /^(.*)(pattern)(.*)$/s and use $1 for $`, $2 for $ and $+ for $' ($+ is not dependent on the number of parens in the original pattern). Note that the /s switch can alter the meaning of . in your pattern. Best -John if ($component =~ /$rfc822_illegal_atom_character+/o) { # one-based for the user, plus the local-part length for hosts. my $bad_start = length($`)+$offset; # [we'd like to get $bad_chars in the message. but the only reliable # way to avoid confusing the browser is to render them in hex, which the # user is not likely to understand. -- rgr, 14-Dec-98.] my $bad_chars = $; elsif ($result =~ /\*\*\*.*/) { # error message in MX retrieval; probably no such domain. address_error (Our server cannot figure out how to send mail to ttquot; . html_quotify($domain_name) . quot;/tt. Here is the . error returned by the name server:\nblockquote . html_quotify($) -- John Tobey [EMAIL PROTECTED] \^-^ /\ /\ ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] RE: Email filtering example code...
Here's some sample code to give people a better idea of what I am trying to do (it's not pretty). It will parse the email address given as I had planned, but doesn't actually do any comparisons yet. My primary reason for wanting to do this is DWIM for the user, where [EMAIL PROTECTED] will not be disallowed by *@tobago.* whose intent is to filter tobago.com, tobago.net, etc, not prodigy.net. ==CUT FROM HERE #/usr/bin/perl # This would actually come from a .cfg file my $users = fred\@fsck.com, john\@ted.com, dev\@null.*, weiner\@fred.*; my $user = shift; # get email from shell - prog.pl [EMAIL PROTECTED] my $valid_tlds = [com][gov][org][edu][net][biz][arpa][int][nato][info][name][museum][coop][a ero][pro][co][mil]; my $bool = check_user( $user, $users ); # call sub sub check_user { my $user = shift @_; my $blocked_users = shift @_; my( @bad_users, $name, $tmp, @cnames, $dn, $tld, $country, $port ); $blocked_users =~ s/\,\s*/+/g; foreach( split /\+/, $blocked_users ) { push @bad_users, $_; } ($name, my $tmp) = split /\@/, $user; @cnames = split /\./, $tmp; $tld = pop @cnames; if( length( $tld ) 3 ) { $country = $tld; $tld = pop @cnames; if( $valid_tlds !~ /\[$tld\]/ ) { $dn = $tld; $tld = $country; } } $dn = pop @cnames if !$dn; $dn = $tld if !$dn; print NAME: $name\n if $name; print CNAMES: , join( '.', @cnames ), \n; print DOMAIN: $dn\n; print TLD: $tld\n if defined $tld; print COUNTRY: $country\n if defined $country; } # end check_user() =CUT TO HERE Each of the @bad_users entries is parsed exactly the same with '*'s replaced with '.*'s, and then each component would be matched against the corresponding user component, like so: $user_tld =~ /^$blocked_tld$/ if $user_tld; $user_dn =~ /^$blocked_dn$/ if $user_dn; ... Hope that helps, Grant M. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, 11 Feb 2003, Bob Rogers wrote: From: John Tobey [EMAIL PROTECTED] I would like to point out that your code can be improved by replacing uses of $ and $` with parentheses in the regexes followed by $1 and $2. This is from the Devel::SawAmpersand doc . . . I was unaware of this issue; thanks for bringing it to my attention. But Devel::SawAmpersand doesn't really explain the problem in any kind of depth, and just talks about massive in-memory copying. So, presumably, this is just a question of efficiency? Yep. If you've got a copy of _Mastering Regular Expressions_, look up Perl's $, $`, and $' in the index (especially pp. 273-278) -- there's a long section explaining that any use of these constructs can have severe performance penalties on your code, for a lot of technical reasons that I won't try to summarize in this message (though others may want to). The short of it is, if you can get away with it, to *never* use these anywhere in your programs (and by proxy to that, don't use modules that use these constructs, such as Carp.pm or English.pm). This is based on the first edition of the book, which was written in 1997 against whatever version of Perl was contemporary at the time (5.4x?). For NMS this might be an appropriate constraint, but for anyone else the advice may have been superceded by _MRE, 2nd ed._ and later versions of Perl -- I'm not sure. -- Chris Devers[EMAIL PROTECTED] NIH, adj. [Abbrev. Not Invented Here.] Pertaining to a much respected and widely practiced branch of design philosophy, unique among philosophical schools in that, by definition, the adherents refuse to talk to one another. Motivated by a fanatical hatred of plagiarism, NIH followers selflessly limit the domain of their responsibilities to their oen humble artifacts. See also WHEEL. -- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995 ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Email filtering...
On Tue, Feb 11, 2003 at 06:26:56PM -0500, Chris Devers wrote: The short of it is, if you can get away with it, to *never* use these anywhere in your programs (and by proxy to that, don't use modules that use these constructs, such as Carp.pm or English.pm). This is based on the English yes, but Carp? -- John Tobey [EMAIL PROTECTED] \^-^ /\ /\ ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] damian here in sept. 2003
CD == Chris Devers [EMAIL PROTECTED] writes: CD On Tue, 11 Feb 2003 [EMAIL PROTECTED] wrote: if this is the menu: http://www.yetanother.org/damian/seminars/ well, i meant which classes you want him to teach but selecting his pm talk is fine too. :) CD Not that we're in a hurry, but I might as well start stuffing the ballot: CD 1 Perligata CD 2 Perligata CD 3 Perligata CD 4 Perligata CD 5 Perligata ballot stuffing detected. your votes are disqualified and you are banned from any future damian talks. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com - Stem and Perl Development, Systems Architecture, Design and Coding Search or Offer Perl Jobs http://jobs.perl.org Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Perl v. Java...
On Tuesday, February 11, 2003, at 12:02 PM, Sherm Pendley wrote: On Monday, February 10, 2003, at 11:04 PM, Erik Price wrote: Java doesn't let you just bust out what's on your mind -- I've discovered that you really have to plan out your application's public interface You say that as if it's a bad thing, or as if it applies only to Java. Sorry, didn't mean to give that impression. I don't think it's a bad thing at all, nor do I think it only applies to Java, or even only programming. Erik -- Erik Price email: [EMAIL PROTECTED] jabber: [EMAIL PROTECTED] ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm