Help with regex

Will of Thornhenge Fri, 01 Aug 2003 18:06:57 -0700

I'm doing some work with mail headers that involves converting timestamps to a standard format. The following regex works except for one pesky trailing close parens.

Here's a sample of the data that causes problems:

==sample data
Date: Fri, 1 Aug 1997 08:10:16 -0700 (PDT)<br>
===

This is converted to a YYYYMMDD.hhmmss format in place, then the result is fed to this regex:

==code extract
# handle YYYYMMDD.hhmmss +0530 (IST) and similar
while (/\b
        (                   # $1 to $old
         (\d{8}\.\d{6})     # $2 to datestamp
         \s+
         ([-+]?\d\d\d\d)    # $3 to $timezone
         ( \s+ [(]?         # $4 if there is an abbrev,
          [A-Z]{2,5}        # like EST or (EST)
          [)]? )?           # then just get rid of it
        )
       \b/x ) {
   my ($old, $d1, $z1, ) = ($1, $2, $3, );
   if (exists $timeZones{$z1}) {
      my $z2 = $timeZones{$z1};  # obtain the abbreviation
      $z1 = $timeZones{$z2};     # then the numeric value for the abbrev
      my $d2 = date2Epoch($d1) + 3600 * ($tz - $z1);
      s/\Q$old\E/'_' . epoch2Date($d2) . ' ' . $tzabbrev/e;
   }
   else {
      s/\Q$old\E/_$old/;   # just mark it unchanged
   }
}
s/_(\d{8}\.\d{6})/$1/g;    # clean up markers
return $_;
====

The output I'm getting is

==converted sample
Date: 19970801.071016 PST)<br>
====

The continued existence of that closing parens is the problem. It is not being included in $1, which becomes $old. How can I force its inclusion (and why is the regex not behaving greedily?)

--
Will Woodhull
[EMAIL PROTECTED]


_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Help with regex

Reply via email to