Re: [Boston.pm] Email filtering...

2003-02-11 Thread David Cantrell
On Tue, Feb 11, 2003 at 09:42:59AM -0500, Wizard wrote:
 I'm writing a script to do email filtering, and I have some questions:
 1. These are all valid email address:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED] - this is valid, but not standard

What's non-standard about it? (apart from the spelling)

 The problem is that last one with DOMAIN.CC format. Not having a TLD is NOT
 TYPICAL for UK domains (there are only 7?), but are there other non-US
 countries using a similar format?

Indeed it's not typical, but it is standard in that it has all the right
records in all the right places like the DNS*.  And yes, PLENTY of other
countries have longword.cc-style domains.  France, for instance, and
Germany, and Canada.

* - I assume, I haven't bothered looking it up

This is what I am expecting for
 international emails, and others will presently fail:
  ^
it always makes me laugh when people misuse this word to mean not in my
country.

 USER_NAME[_W_DOTS*]@CNAMES*.DOMAIN.TLD.CC[:PORT?], an extreme being like
 this:
 [EMAIL PROTECTED]:8021, where 'TLD' is required
 if 'CC' is present.
 Is this a problem?

Yes, it's a problem.  It won't work.

 2. Do email addresses ever have port numbers appended, like this:
 [EMAIL PROTECTED]:24

If they do, then they're invalid AFAIK.

 3. Are there any US domains that don't look like this:
 CNAMES*.DOMAIN.TLD[:PORT?]

Yes, anything in .us.

In general, trying to divine what is and is not a valid mail domain without
looking it up in the DNS is doomed to failure.

-- 
David Cantrell | Sysadmin/programmer for hire | [EMAIL PROTECTED]

One person can change the world, but most of the time they shouldn't
-- Marge Simpson
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread Simon Wilcox
On Tue, 2003-02-11 at 15:49, darren chamberlain wrote:
 * Simon Wilcox essuu at ourshack.com [2003-02-11 10:47]:
  We can actually use CPAN modules (as long as they are pure perl) but
  we need to distribute them as part of the code.
 
 Well, then, bundle in Email::Valid, because it's wonderful, and is (more
 or less) the definitive way to do what it does.

We should certainly be looking at it. No point reinventing wheels etc
etc :)

Over to Grant, he's the lead on this particular part of the NMS project.

Simon.
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread Stephen Reppucci
On Tue, 11 Feb 2003, darren chamberlain wrote:

  *@aol.*
  *@*.parliment.uk
  fred@*.sourceforge.*
  etc.

 Hmm, nice:

   # Assume @addrs is the above list:
   for (my $i = 0; $i  @addrs; $i++) {
   $addrs[$i] =~ s/\*/.*/g;
   $addrs[$i] =~ s/\./\\./g;
   $addrs[$i] = qr($addrs[$i]);
   }

 You've got a list of regexes.  Cool.

Except for one thing: you'll want to swap those first two statements
in the for loop -- otherwise you'll end up escaping your regex '.'s that
replaced the wildcard '*'s...

-- 
Steve Reppucci   [EMAIL PROTECTED] |
Logical Choice Software  http://logsoft.com/ |
=-=-=-=-=-=-=-=-=-=-  My God!  What have I done?  -=-=-=-=-=-=-=-=-=-=

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



RE: [Boston.pm] Email filtering...

2003-02-11 Thread Wizard
 Did you volunteer for this? Did anyone laugh when you spoke up?

Just in my head. They're always laughing at me. What? I couldn't KILL HER...
Grant M. ( Co.)


___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread Paul Makepeace
On Tue, Feb 11, 2003 at 11:12:28AM -0500, darren chamberlain wrote:
 * Stephen Reppucci sgr at logsoft.com [2003-02-11 11:00]:
  Except for one thing: you'll want to swap those first two statements
  in the for loop -- otherwise you'll end up escaping your regex '.'s
  that replaced the wildcard '*'s...
 
 Damn.  I did that in my test code, too, but pasted the wrong one in.
 Good catch.

Something like this,

my @local_parts = sort { ($a eq '*' ? 2 : $a =~ /\*/ ? 1 : 0)
   =
 ($b eq '*' ? 2 : $b =~ /\*/ ? 1 : 0) }
  # your source

That puts '*' last, something with * next, and anything else first.
Add map qr/^$_$/ to taste.

P

-- 
Paul Makepeace ... http://paulm.com/

If (x == 1), then that must make me a goddess.
   -- http://paulm.com/toys/surrealism/
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread Gyepi SAM
On Tue, Feb 11, 2003 at 11:10:51AM -0500, Wizard wrote:
 Here's an example that I just sent to the NMS list:
 
 I want it to be universal. For instance, how would you parse this:
 *@*.aol.*
 
 into:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 and
 [EMAIL PROTECTED]
 
 It should match the first two, but not the last, because the DOMAIN in the
 last is 'parliment', not 'aol'.

I don't see how '*@*.aol.*' can match '[EMAIL PROTECTED]'.
How do you account for the first '.' in the match expression?

-Gyepi
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread Ron Newman

On Tuesday, February 11, 2003, at 12:23  PM, Gyepi SAM wrote:

I don't see how '*@*.aol.*' can match '[EMAIL PROTECTED]'.
How do you account for the first '.' in the match expression?


For that matter, can a regular expression validly begin with * at all?
What does that mean?

And why would you want to match a string of zero or more @ characters?

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] damian here in sept. 2003

2003-02-11 Thread Chris Devers
On Tue, 11 Feb 2003 [EMAIL PROTECTED] wrote:

 if this is the menu:
 http://www.yetanother.org/damian/seminars/

 then I'll take
   1 Perl6
   2 Sky
   3 Multimethods
   4 Everyday
   5 Extreme
 in that order.

Not that we're in a hurry, but I might as well start stuffing the ballot:

 1 Perligata
 2 Perligata
 3 Perligata
 4 Perligata
 5 Perligata

:)


-- 
Chris Devers[EMAIL PROTECTED]

stability, n. [Latin stabulum a pothouse, haunt, brothel.]
1 A nirvana-type situation that calls for drinks and layoffs all round.
2 The period between crashes.

-- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



RE: [Boston.pm] Email filtering...

2003-02-11 Thread Wizard
Sorry I took so long to get back, I was at an interview
  I don't see how '*@*.aol.*' can match '[EMAIL PROTECTED]'.
  How do you account for the first '.' in the match expression?

 For that matter, can a regular expression validly begin with * at all?
 What does that mean?

 And why would you want to match a string of zero or more @ characters?


Apparently, I'm not really explaining myself well. The match criteria is
external, in a user-defined config file. like this:
BANNED_USER = *@aol.com, *@*.sourceforge.net, *@*.microsoft.*, etc.

The cfg entries are then split into @user, @cnames, $domain, $tld, and $cc
if pertinent. THEN they are s/\*/\.\*/g. so you end up with matches like
this (PSEUDOCODE):
$user =~ /^$cfg_user$/ and if $user is [EMAIL PROTECTED] then I never get to
checking the cnames, because the .com and the aol match first. Even if they
didn't, I would not check the cnames because they aren't defined in the user
email.

I hope that clarifies things better,
Grant M.



___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread John Tobey
On Tue, Feb 11, 2003 at 06:27:15PM -0500, Uri Guttman wrote:
  BR == Bob Rogers [EMAIL PROTECTED] writes:
 
 
   BRI would like to point out that your code can be improved by replacing
   BRuses of $ and $` with parentheses in the regexes followed by $1 and
   BR$2.  This is from the Devel::SawAmpersand doc . . .
 
   BR I was unaware of this issue; thanks for bringing it to my attention.
   BR But Devel::SawAmpersand doesn't really explain the problem in any kind
   BR of depth, and just talks about massive in-memory copying.  So,
   BR presumably, this is just a question of efficiency?
 
 the problem only happens with s///. if you use $, then all uses of s///
 must do a full copy of the original string in case parts of it are
 referred to by $ (and friends). the s/// could change the string and so
 a copy must be made. this is true for all instances of s/// in your
 program. if you use parens then only those instances will need extra
 copies.

Just to clarify, there is a problem with any code that does anything
to any of the three variables, if that code is intended for use in
programs that may use regexes heavily.  Since a module or library
should not assume things about what kind of programs will use it,
essentially all code should avoid them.  In one-liners without
performance concerns, they are not a problem.

More details are in `perldoc perlvar` and the perl5-porters mailing
list archives.

-- 
John Tobey [EMAIL PROTECTED]
\^-^
/\  /\
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread John Tobey
On Tue, Feb 11, 2003 at 03:16:32PM -0500, Bob Rogers wrote:

Also, I've attached a code fragment I wrote to validate email address
 syntax.

Thanks very much for the code sample and link to Bernstein's great RFC
822 info site.

I would like to point out that your code can be improved by replacing
uses of $ and $` with parentheses in the regexes followed by $1 and
$2.  This is from the Devel::SawAmpersand doc:

*   never use $ and friends in a library.

*   Don't use English in a library, because it contains the three bad
fellows. Corollary: if you really want to use English, do it like
so:

use English qw( -no_match_vars ) ;

*   before you release a module or program, check if sawampersand is set
by any of the modules you use or require.

Fortunately perl offers easy to use alternatives, that is

   instead of this  you can use this

 $`   of   /pattern/  $1   of  /(.*?)pattern/s
 $ of   /pattern/  $1   of  /(pattern)/
 $'   of   /pattern/  $+   of  /pattern(.*)/s

In general, apply /^(.*)(pattern)(.*)$/s and use $1 for $`, $2
for $ and $+ for $' ($+ is not dependent on the number of parens
in the original pattern). Note that the /s switch can alter the
meaning of . in your pattern.

Best
-John

 if ($component =~ /$rfc822_illegal_atom_character+/o) {
   # one-based for the user, plus the local-part length for hosts.
   my $bad_start = length($`)+$offset;
   # [we'd like to get $bad_chars in the message.  but the only reliable
   # way to avoid confusing the browser is to render them in hex, which the
   # user is not likely to understand.  -- rgr, 14-Dec-98.]
   my $bad_chars = $;


   elsif ($result =~ /\*\*\*.*/) {
   # error message in MX retrieval; probably no such domain.
   address_error
   (Our server cannot figure out how to send mail to ttquot;
. html_quotify($domain_name) . quot;/tt.  Here is the 
. error returned by the name server:\nblockquote
. html_quotify($)

-- 
John Tobey [EMAIL PROTECTED]
\^-^
/\  /\
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



[Boston.pm] RE: Email filtering example code...

2003-02-11 Thread Wizard
Here's some sample code to give people a better idea of what I am trying to
do (it's not pretty). It will parse the email address given as I had
planned, but doesn't actually do any comparisons yet. My primary reason for
wanting to do this is DWIM for the user, where [EMAIL PROTECTED] will
not be disallowed by *@tobago.* whose intent is to filter tobago.com,
tobago.net, etc, not prodigy.net.
==CUT FROM HERE
#/usr/bin/perl

# This would actually come from a .cfg file
my $users = fred\@fsck.com, john\@ted.com, dev\@null.*, weiner\@fred.*;

my $user = shift; # get email from shell - prog.pl [EMAIL PROTECTED]

my $valid_tlds =
[com][gov][org][edu][net][biz][arpa][int][nato][info][name][museum][coop][a
ero][pro][co][mil];

my $bool = check_user( $user, $users ); # call sub

sub check_user {
my $user = shift @_;
my $blocked_users = shift @_;
my( @bad_users, $name, $tmp, @cnames, $dn, $tld, $country, $port );
$blocked_users =~ s/\,\s*/+/g;
foreach( split /\+/, $blocked_users ) {
push @bad_users, $_;
}
($name, my $tmp) = split /\@/, $user;
@cnames = split /\./, $tmp;
$tld = pop @cnames;
if( length( $tld )  3 ) {
$country = $tld;
$tld = pop @cnames;
if( $valid_tlds !~ /\[$tld\]/ ) {
$dn = $tld;
$tld = $country;
}
}
$dn = pop @cnames if !$dn;
$dn = $tld if !$dn;
print NAME: $name\n if $name;
print CNAMES: , join( '.', @cnames ), \n;
print DOMAIN: $dn\n;
print TLD: $tld\n if defined $tld;
print COUNTRY: $country\n if defined $country;
} # end check_user()
=CUT TO HERE
Each of the @bad_users entries is parsed exactly the same with '*'s replaced
with '.*'s, and then each component would be matched against the
corresponding user component, like so:
$user_tld =~ /^$blocked_tld$/ if $user_tld;
$user_dn =~ /^$blocked_dn$/ if $user_dn;
...
Hope that helps,
Grant M.


___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread Chris Devers
On Tue, 11 Feb 2003, Bob Rogers wrote:

From: John Tobey [EMAIL PROTECTED]

I would like to point out that your code can be improved by replacing
uses of $ and $` with parentheses in the regexes followed by $1 and
$2.  This is from the Devel::SawAmpersand doc . . .

 I was unaware of this issue; thanks for bringing it to my attention. But
 Devel::SawAmpersand doesn't really explain the problem in any kind of
 depth, and just talks about massive in-memory copying.  So,
 presumably, this is just a question of efficiency?

Yep. If you've got a copy of _Mastering Regular Expressions_, look up
Perl's $, $`, and $' in the index (especially pp. 273-278) -- there's a
long section explaining that any use of these constructs can have severe
performance penalties on your code, for a lot of technical reasons that I
won't try to summarize in this message (though others may want to).

The short of it is, if you can get away with it, to *never* use these
anywhere in your programs (and by proxy to that, don't use modules that
use these constructs, such as Carp.pm or English.pm). This is based on the
first edition of the book, which was written in 1997 against whatever
version of Perl was contemporary at the time (5.4x?). For NMS this might
be an appropriate constraint, but for anyone else the advice may have been
superceded by _MRE, 2nd ed._ and later versions of Perl -- I'm not sure.



-- 
Chris Devers[EMAIL PROTECTED]

NIH, adj. [Abbrev. Not Invented Here.]
Pertaining to a much respected and widely practiced branch of design
philosophy, unique among philosophical schools in that, by
definition, the adherents refuse to talk to one another. Motivated by a
fanatical hatred of plagiarism, NIH followers selflessly limit the
domain of their responsibilities to their oen humble artifacts. See
also WHEEL.

-- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Email filtering...

2003-02-11 Thread John Tobey
On Tue, Feb 11, 2003 at 06:26:56PM -0500, Chris Devers wrote:
 
 The short of it is, if you can get away with it, to *never* use these
 anywhere in your programs (and by proxy to that, don't use modules that
 use these constructs, such as Carp.pm or English.pm). This is based on the

English yes, but Carp?

-- 
John Tobey [EMAIL PROTECTED]
\^-^
/\  /\
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] damian here in sept. 2003

2003-02-11 Thread Uri Guttman
 CD == Chris Devers [EMAIL PROTECTED] writes:

  CD On Tue, 11 Feb 2003 [EMAIL PROTECTED] wrote:
   if this is the menu:
   http://www.yetanother.org/damian/seminars/

well, i meant which classes you want him to teach but selecting his pm
talk is fine too. :)

  CD Not that we're in a hurry, but I might as well start stuffing the ballot:

  CD  1 Perligata
  CD  2 Perligata
  CD  3 Perligata
  CD  4 Perligata
  CD  5 Perligata

ballot stuffing detected. your votes are disqualified and you are banned
from any future damian talks.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
- Stem and Perl Development, Systems Architecture, Design and Coding 
Search or Offer Perl Jobs    http://jobs.perl.org
Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class




___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



Re: [Boston.pm] Perl v. Java...

2003-02-11 Thread Erik Price

On Tuesday, February 11, 2003, at 12:02  PM, Sherm Pendley wrote:


On Monday, February 10, 2003, at 11:04 PM, Erik Price wrote:


Java doesn't let you just bust out what's on your mind -- I've  
discovered that you really have to plan out your application's public 
 interface

You say that as if it's a bad thing, or as if it applies only to Java.


Sorry, didn't mean to give that impression.  I don't think it's a bad 
thing at all, nor do I think it only applies to Java, or even only 
programming.






Erik




--
Erik Price

email: [EMAIL PROTECTED]
jabber: [EMAIL PROTECTED]

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm