Re: Chomp to trim '\r'
Carl Jolley wrote: After all, all binmode is, is a way to turn off perl's default way of handling the "\r" for you. It's your way of saying to perl "Thanks, but no thanks for your offer of assistance, For this file, I can handle this issue without your help". While this is true within the context of this discussion (newline conversions), binmode does other things that can also be very important. For instance, I found out several years ago that it was necessary to use binmode when opening some Word97 files in Perl scripts (under Win98), or some of the files would be truncated in the proprietary MS header. As I recall, the specific problem was a \000 byte early in the header that Perl would treat as an eof (unless binmode was in use), but I understand that binmode also turns off Perl's default handling of embedded Ctrl-Z and other control characters. So I'm not disagreeing with you, but simply pointing out that whenever the input file is in a proprietary format that mixes text with binary segments, binmode is something to think about. At least under Windows, DOS, and related OSs. And then there's the whole disciplines thing that can be done with binmode (or a 3 argument open). But I don't go there; it is my fervent hope that Perl 6.0 will fix up all that stuff before I have to deal with any multibyte unicode data. :-) -- Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Help Needed
Have you tried playing around with character sets? Something like $target = 'mevqgn'; $length_target = length $target; if ( $LS_Val =~ /-{1,2}[$target]{$length_target}/ ) { #do something } Whether the above would work for you would depend on whether the code can ignore positive matches on $LS_Val = '--mmmqqq' and so forth. It might be worthwhile to look more closely at the data and see whether there are "don't care" cases that you can ignore. If there are not, then there is a loop approach: $t = 'mevqgn'; # just to save keystrokes $x = $LS_Val; if ( $x =~ /(-{1,2})/ ) { $goodSoFar = $1; while (length $t and $x =~ /($goodSoFar([$t]))/ ) { $goodSoFar = $1; $t =~ s/$2//; } do_Something unless length $t; } That's undoubtedly slower than your original approach, but would be more versatile and possibly easier to maintain. (Neither snippet above has been tested) Dax T. Games wrote: I have a list of characters. I need to get a list of all possble sequences of these characters for example. I have a string that consists of '-mevqgn' I need to pattern match any combination of 'mevqgn' with a preceding - or --. Right now this is what I am doing but it is very ugly and difficult to come up with the combinations and it makes my brain hurt!: if ($LS_Val =~ /-{1,2}(mevqgn| emvqgn|evmqgn|evqmgn|evqgmn|evqgnm| veqgnm|vqegnm|vqgenm|vqgnem|vagnme| qvgnme|qgvnme|qgnvme|qgnmve|qgnmev| gqmnev|gmqnev|gmnqev|gmneqv|gmnevq| mgnevq|mngevq|mnegvq|mnevgq|mnevqg| nmevqg|nemvqg|nevmqg|nevqmg|nevqgm| envqgm|evnqgm|evqngm|evqgnm|evqgmn| )/i) { #Do Something; } A subroutine that takes the string of characters as an argument and then returns 1 on success and undef on fail would be ideal for my purpose. Any help is appreciated. Dax -- Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: empty versus zero
Lynn. Rickards wrote: Thanks Will but I'll own up to maybe not reading the spec closely enough. Is the string " " to be considered empty? It passes defined() nevertheless...as does the empty string ''. Embarrassing. No, you were correct; the context that was supplied showed that the "empty" array elements had to be populated with nulls. The quick test code I wrote up gave me results that looked like what I've seen in sparse arrays, and I screwed up in not recognizing that split can't generate a sparse array. My apologies to the list. [Note to self: stop trying to interleave perl-win32-users interactions with other activities. Remember that the questions may be more subtle than they first appear.] -- Will Woodhull [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Will of Thornhenge Sent: Wednesday, August 13, 2003 8:36 PM To: [EMAIL PROTECTED] Cc: 'David Byrne' Subject: Re: empty versus zero I prefer Lynn. Rickards' method to the others I've seen mentioned. Testing whether the value is defined will always work quietly and swiftly; several of the other tests proposed will generate warnings under some conditions, which can really bog down a loop. Lynn. Rickards wrote: I think this is a fairly simple question... How can I count empty values in an array? This count should not include zeros or non-empty values. Below is my current script, but it isn't working properly. I appreciate any assitance that you may provide. Thank you, David #!perl -w # Count missing values while (my $line = <>) { chomp ($line); my ($probe_id,$expression) = split /\t/, $line, 2; my @expression = split /\t/, $expression; my $count = 0; foreach my $value (@expression) { # if (!$value) { One way could be to use defined() unless(defined($value)) $count += 1; # } } print "$count\n"; } Another way, utilizing the magic "for" and "defined" do with "$_": for (@expression) { $count++ if not defined} print $count, "\n"; So lots of different ways to do it. -- Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: CSV munging and "uninitialized values"
Terry Fowler wrote: "$Bill Luebkert" wrote: for (@line .. 41) { $line[$_] = ''; } I've seen this use of ".." only a few times before and don't really know what it's all about. I don't even know what to look for in the Llama book - "them two dots"? Terry Fowler "range operator". In the Camel ed 3, it is discussed on p 103. While its use is straightforward in list context, its behavior in scalar context surprises me. Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: system ('time', $time) driving me nuts
[EMAIL PROTECTED] wrote: It should be: system ('time', "$time") and warn "no joy: $?"; where $? is the ever-popular CHILD_ERROR from the OS. Ah! Thanks for the correction-- I do tend to use $! where $? or $@ is called for. On the subject of the 'awkwardness' of the construct, I like it. Randal Schwartz made it look Perlish and peculiar on purpose. But it runs like the traditional whatever_function or die "yadda yadda: $!"; that we're used to. I usually write it more in the form of system (@sys_args) and die "system @sys_args failed: $?\n"; but I adapted it to the form requested by the OP. I like your approach; I can see where it would have advantages in some places. system "time", $time == 0 or die "Bad call 'time $time': $?\n"; Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Data::Dump format question
I just tried to use Data::Dump in place of Data::Dumper->Dump for the first time. I see some nice advantages in the simpler interface. But it is formatting zip codes as '97_479'. Worse yet, it is doing that to serial numbers I'm using as hash keys-- that would mess up a persistent data store! Is there any way of suppressing the underscore when it formats numbers as strings? Or is there a real simple post-processing trick to patch things up? -- Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Help with regex
I'm doing some work with mail headers that involves converting timestamps to a standard format. The following regex works except for one pesky trailing close parens. Here's a sample of the data that causes problems: ==sample data Date: Fri, 1 Aug 1997 08:10:16 -0700 (PDT) === This is converted to a MMDD.hhmmss format in place, then the result is fed to this regex: ==code extract # handle MMDD.hhmmss +0530 (IST) and similar while (/\b ( # $1 to $old (\d{8}\.\d{6}) # $2 to datestamp \s+ ([-+]?\d\d\d\d)# $3 to $timezone ( \s+ [(]? # $4 if there is an abbrev, [A-Z]{2,5}# like EST or (EST) [)]? )? # then just get rid of it ) \b/x ) { my ($old, $d1, $z1, ) = ($1, $2, $3, ); if (exists $timeZones{$z1}) { my $z2 = $timeZones{$z1}; # obtain the abbreviation $z1 = $timeZones{$z2}; # then the numeric value for the abbrev my $d2 = date2Epoch($d1) + 3600 * ($tz - $z1); s/\Q$old\E/'_' . epoch2Date($d2) . ' ' . $tzabbrev/e; } else { s/\Q$old\E/_$old/; # just mark it unchanged } } s/_(\d{8}\.\d{6})/$1/g;# clean up markers return $_; The output I'm getting is ==converted sample Date: 19970801.071016 PST) The continued existence of that closing parens is the problem. It is not being included in $1, which becomes $old. How can I force its inclusion (and why is the regex not behaving greedily?) -- Will Woodhull [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Recursive design
Roger C Haslock wrote: As usual, its been done before. Look at the modules which support SpamAssassin, and particularly MIME-Tools. Get them from CPAN :-) Ah! Good point-- I can maybe study out how Eryq managed this problem in MIME-Tools. Unfortunately, I can't use MIME-Tools directly with the data sets I'm working with. Some of the older files, from around 1996, appear to have been reprocessed in bad ways (or maybe the messages were generated by software that didn't do MIME right). Anyway, they've got a scattering of malformed headers and broken encodings, and I think I'm better off rolling my own simple routines in this situation. Besides, maybe I'll learn something. -- Will ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Recursive design
I hate recursion. It makes my head hurt. Background: I'm working on reformating .mbox files to convert email archives to HTML and to PDA compatible text. I'm running into problems with the MIME types "multipart/mixed" and "multipart/related". These are umbrella types that can hold an assortment of simple types, like "text/plain", and "image/jpg". However they can also hold other multipart types, which can happen when someone backquotes the entirety of a previous multipart message. There is also the "multipart/alternative" type, but these are always collections of simple types where the user agent chooses one and ignores the rest-- they are never re-entrant. My code is something like this (following is simplified to keep it short and on point): sub handleBody { my ($type, $body) = @_; my $superbody = ''; if ( $type =~ m{multipart}i ) { my ( @parts ) = splitOnBoundary($type, $body); if ( $type =~ m{alternative}i ) { # code to find the $best of the alternatives in @parts ($type, $body) = handlePart($parts[$best]) processSimpleType($type, $body); return $body; } else { # PLACE WHERE MY HEAD HURTS foreach (@parts) { $superbody .= handleBody(handlePart($_) ); } return $superbody; } } else { # handle a simple type processSimpleType($type, $body) return $body } } sub handlePart { my $part = shift; my ($head, $body) = split /^$/m, $part, 2; # treat exceptions as a type of its own: my $type = '[NONE STATED]'; if ( $head =~ m{^Content-Type: (.*)$}mi ) { $type = $1; } return ($type, $body); } Hmm, as I wrote this, I discovered the apparent need for $superbody, and I think I may have solved my logic problem. So the first of my two questions: 1) Does the above code look right? A major difficulty is that I'm dealing with archives where some of the messages aren't fully compliant with the MIME standard and I can't tell whether the bugs I've got are in the logic or because I need to tweak the regexes to handle the special cases. Which brings me to the other, more important, question: 2) Is there a better tool for designing re-entrant code other than pseudocode? How do people who do a lot of this kind of thing work out the design? -- Will ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs