Re: Chomp to trim '\r'

2003-10-08 Thread Will of Thornhenge


Carl Jolley wrote:

After all, all binmode is, is a way
to turn off perl's default way of handling the "\r" for you.
It's your way of saying to perl "Thanks, but no thanks for your
offer of assistance, For this file, I can handle this issue without
your help".


While this is true within the context of this discussion (newline 
conversions), binmode does other things that can also be very important. 
For instance, I found out several years ago that it was necessary to use 
binmode when opening some Word97 files in Perl scripts (under Win98), or 
some of the files would be truncated in the proprietary MS header. As I 
recall, the specific problem was a \000 byte early in the header that 
Perl would treat as an eof (unless binmode was in use), but I understand 
that binmode also turns off Perl's default handling of embedded Ctrl-Z 
and other control characters.

So I'm not disagreeing with you, but simply pointing out that whenever 
the input file is in a proprietary format that mixes text with binary 
segments, binmode is something to think about. At least under Windows, 
DOS, and related OSs.

And then there's the whole disciplines thing that can be done with 
binmode (or a 3 argument open). But I don't go there; it is my fervent 
hope that Perl 6.0 will fix up all that stuff before I have to deal with 
any multibyte unicode data.  :-)

--
Will Woodhull
[EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Help Needed

2003-09-02 Thread Will of Thornhenge
Have you tried playing around with character sets? Something like

$target = 'mevqgn';
$length_target = length $target;
if ( $LS_Val =~ /-{1,2}[$target]{$length_target}/ ) {
   #do something
}
Whether the above would work for you would depend on whether the code 
can ignore positive matches on $LS_Val = '--mmmqqq' and so forth. It 
might be worthwhile to look more closely at the data and see whether 
there are "don't care" cases that you can ignore.

If there are not, then there is a loop approach:

$t = 'mevqgn'; # just to save keystrokes
$x = $LS_Val;
if ( $x =~ /(-{1,2})/ ) {
   $goodSoFar = $1;
   while (length $t and $x =~ /($goodSoFar([$t]))/ ) {
   $goodSoFar = $1;
   $t =~ s/$2//;
   }
   do_Something unless length $t;
}
That's undoubtedly slower than your original approach, but would be more 
versatile and possibly easier to maintain.

(Neither snippet above has been tested)

Dax T. Games wrote:
I have a list of characters.  I need to get a list of all possble 
sequences of these characters for example. 
 
I have a string that consists of '-mevqgn' I need to pattern match any 
combination of 'mevqgn' with a preceding - or --.
 
Right now this is what I am doing but it is very ugly and difficult to 
come up with the combinations and it makes my brain hurt!:
 
 if ($LS_Val =~ /-{1,2}(mevqgn|
   emvqgn|evmqgn|evqmgn|evqgmn|evqgnm|
   veqgnm|vqegnm|vqgenm|vqgnem|vagnme|
   qvgnme|qgvnme|qgnvme|qgnmve|qgnmev|
   gqmnev|gmqnev|gmnqev|gmneqv|gmnevq|
   mgnevq|mngevq|mnegvq|mnevgq|mnevqg|
   nmevqg|nemvqg|nevmqg|nevqmg|nevqgm|
   envqgm|evnqgm|evqngm|evqgnm|evqgmn|
   )/i)
{
#Do Something;
}
   
 
A subroutine that takes the string of characters as an argument and then 
returns 1 on success and undef on fail would be ideal for my purpose.
 
 
Any help is appreciated.
 
Dax


--
Will Woodhull
[EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: empty versus zero

2003-08-14 Thread Will of Thornhenge
Lynn. Rickards wrote:
Thanks Will but I'll own up to maybe not reading the spec closely 
enough. Is the string " " to be considered empty? It passes defined() 
nevertheless...as does the empty string ''. Embarrassing.


No, you were correct; the context that was supplied showed that the 
"empty" array elements had to be populated with nulls. The quick test 
code I wrote up gave me results that looked like what I've seen in 
sparse arrays, and I screwed up in not recognizing that split can't 
generate a sparse array.

My apologies to the list. [Note to self: stop trying to interleave 
perl-win32-users interactions with other activities. Remember that the 
questions may be more subtle than they first appear.]

--
Will Woodhull
[EMAIL PROTECTED]

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of
Will of Thornhenge
Sent: Wednesday, August 13, 2003 8:36 PM
To: [EMAIL PROTECTED]
Cc: 'David Byrne'
Subject: Re: empty versus zero
I prefer Lynn. Rickards' method to the others I've seen mentioned. 
Testing whether the value is defined will always work quietly and 
swiftly; several of the other tests proposed will generate warnings 
under some conditions, which can really bog down a loop.

Lynn. Rickards wrote:




I think this is a fairly simple question...
How can I count empty values in an array?  This count
should not include zeros or non-empty values.  Below
is my current script, but it isn't working properly.
I appreciate any assitance that you may provide.

Thank you,
David
#!perl -w
# Count missing values
while (my $line = <>) {
  chomp ($line);
  my ($probe_id,$expression) = split /\t/, $line, 2;
  my @expression = split /\t/, $expression;
  my $count = 0;
  foreach my $value (@expression) {
#   if (!$value) {
One way could be to use defined()

	unless(defined($value)) $count += 1;


#   }
  }
  print "$count\n";
}
Another way, utilizing the magic "for" and "defined" do with "$_":

for (@expression) { $count++ if not defined}
print $count, "\n";
So lots of different ways to do it.

--
Will Woodhull
[EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: CSV munging and "uninitialized values"

2003-08-14 Thread Will of Thornhenge


Terry Fowler wrote:
"$Bill Luebkert" wrote:

for (@line .. 41) {
   $line[$_] = '';
}


I've seen this use of ".." only a few times before and
don't really know what it's all about. I don't even 
know what to look for in the Llama book - "them two dots"?

Terry Fowler
"range operator". In the Camel ed 3, it is discussed on p 103.

While its use is straightforward in list context, its behavior in scalar 
context surprises me.

Will Woodhull
[EMAIL PROTECTED]


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: system ('time', $time) driving me nuts

2003-08-10 Thread Will of Thornhenge


[EMAIL PROTECTED] wrote:



It should be:

system ('time', "$time") and warn "no joy: $?";

where $? is the ever-popular CHILD_ERROR from the OS.


Ah! Thanks for the correction-- I do tend to use $! where $? or $@ is 
called for.


On the subject of the 'awkwardness' of the construct, I like it.  Randal
Schwartz made it look Perlish and peculiar on purpose.  But it runs like
the traditional
whatever_function or die "yadda yadda: $!";

that we're used to.  I usually write it more in the form of

system (@sys_args) and die "system @sys_args failed: $?\n";

but I adapted it to the form requested by the OP.


I like your approach; I can see where it would have advantages in some 
places.

system "time", $time == 0 or die "Bad call 'time $time': $?\n";

Will Woodhull
[EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Data::Dump format question

2003-08-05 Thread Will of Thornhenge
I just tried to use Data::Dump in place of Data::Dumper->Dump for the 
first time. I see some nice advantages in the simpler interface.

But it is formatting zip codes as '97_479'. Worse yet, it is doing that 
to serial numbers I'm using as hash keys-- that would mess up a 
persistent data store!

Is there any way of suppressing the underscore when it formats numbers 
as strings? Or is there a real simple post-processing trick to patch 
things up?

--
Will Woodhull
[EMAIL PROTECTED]


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Help with regex

2003-08-01 Thread Will of Thornhenge
I'm doing some work with mail headers that involves converting 
timestamps to a standard format. The following regex works except for 
one pesky trailing close parens.

Here's a sample of the data that causes problems:

==sample data
Date: Fri, 1 Aug 1997 08:10:16 -0700 (PDT)
===
This is converted to a MMDD.hhmmss format in place, then the result 
is fed to this regex:

==code extract
# handle MMDD.hhmmss +0530 (IST) and similar
while (/\b
(   # $1 to $old
 (\d{8}\.\d{6}) # $2 to datestamp
 \s+
 ([-+]?\d\d\d\d)# $3 to $timezone
 ( \s+ [(]? # $4 if there is an abbrev,
  [A-Z]{2,5}# like EST or (EST)
  [)]? )?   # then just get rid of it
)
   \b/x ) {
   my ($old, $d1, $z1, ) = ($1, $2, $3, );
   if (exists $timeZones{$z1}) {
  my $z2 = $timeZones{$z1};  # obtain the abbreviation
  $z1 = $timeZones{$z2}; # then the numeric value for the abbrev
  my $d2 = date2Epoch($d1) + 3600 * ($tz - $z1);
  s/\Q$old\E/'_' . epoch2Date($d2) . ' ' . $tzabbrev/e;
   }
   else {
  s/\Q$old\E/_$old/;   # just mark it unchanged
   }
}
s/_(\d{8}\.\d{6})/$1/g;# clean up markers
return $_;

The output I'm getting is

==converted sample
Date: 19970801.071016 PST)

The continued existence of that closing parens is the problem. It is not 
being included in $1, which becomes $old. How can I force its inclusion 
(and why is the regex not behaving greedily?)

--
Will Woodhull
[EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Recursive design

2003-07-11 Thread Will of Thornhenge


Roger C Haslock wrote:
As usual, its been done before. Look at the modules which support
SpamAssassin, and particularly MIME-Tools. Get them from CPAN :-)
Ah! Good point-- I can maybe study out how Eryq managed this problem in 
MIME-Tools. Unfortunately, I can't use MIME-Tools directly with the data 
sets I'm working with. Some of the older files, from around 1996, appear 
to have been reprocessed in bad ways (or maybe the messages were 
generated by software that didn't do MIME right). Anyway, they've got a 
scattering of malformed headers and broken encodings, and I think I'm 
better off rolling my own simple routines in this situation. Besides, 
maybe I'll learn something.

--
Will
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Recursive design

2003-07-11 Thread Will of Thornhenge
I hate recursion. It makes my head hurt.

Background: I'm working on reformating .mbox files to convert email 
archives to HTML and to PDA compatible text. I'm running into problems 
with the MIME types "multipart/mixed" and "multipart/related". These are 
umbrella types that can hold an assortment of simple types, like 
"text/plain", and "image/jpg". However they can also hold other 
multipart types, which can happen when someone backquotes the entirety 
of a previous multipart message.

There is also the "multipart/alternative" type, but these are always 
collections of simple types where the user agent chooses one and ignores 
the rest-- they are never re-entrant.

My code is something like this (following is simplified to keep it short 
and on point):

sub handleBody {
   my ($type, $body) = @_;
   my $superbody = '';
   if ( $type =~ m{multipart}i ) {
  my ( @parts ) = splitOnBoundary($type, $body);
  if ( $type =~ m{alternative}i ) {
 # code to find the $best of the alternatives in @parts
 ($type, $body) = handlePart($parts[$best])
 processSimpleType($type, $body);
 return $body;
  }
  else { # PLACE WHERE MY HEAD HURTS
 foreach (@parts) {
$superbody .= handleBody(handlePart($_) );
 }
 return $superbody;
  }
   }
   else { # handle a simple type
  processSimpleType($type, $body)
  return $body
   }
}
sub handlePart {
   my $part = shift;
   my ($head, $body) = split /^$/m, $part, 2;
   # treat exceptions as a type of its own:
   my $type = '[NONE STATED]';  
   if ( $head =~ m{^Content-Type: (.*)$}mi ) {
  $type = $1;
   }
   return ($type, $body);
}
Hmm, as I wrote this, I discovered the apparent need for $superbody, and 
I think I may have solved my logic problem. So the first of my two 
questions:

1) Does the above code look right?

A major difficulty is that I'm dealing with archives where some of the 
messages aren't fully compliant with the MIME standard and I can't tell 
whether the bugs I've got are in the logic or because I need to tweak 
the regexes to handle the special cases. Which brings me to the other, 
more important, question:

2) Is there a better tool for designing re-entrant code other than 
pseudocode? How do people who do a lot of this kind of thing work out 
the design?

--
Will
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs