Re: Quick Perl Question
1. $filename = 'foo.txt'; open(FH,$filename) or die couldn't open $filename - $!; while ($line = FH){ print $line matches\n if ($line =~ /^USD /); } 2. while ($line=FH){ chomp $line; next unless $line; next if ($line =~ /^-+?$/); next if ($line =~ /^=+?$/); # only good lines get this far } Jack Lauman [EMAIL PROTECTED] 06/19/01 05:05am 1. I want to read in a text file and match any line that begins with three capital letters followed by a space. i.e. USD How do you do that? 2. I need to ignore any blank lines, lines containing all ---, lines containing all ===. Again, how? Thanks in advance, Jack This e-mail and any files transmitted with it are confidential and solely for the use of the intended recipient. ONdigital plc, 346 Queenstown Road, London SW8 4DG. Reg No: 3302715.
Re: Quick Perl Question
Me wrote: Analysis of the code you attached. $quote_date = substr($_,0,79); The above line is pointless. --- Agreed. The next couple lines are great: ($year, $month, $mday, $hour, $minute, $second, $timezone) = $quote_date = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; Although as I indicated, you could omit $quote_date and just do: ($year, $month, $mday, $hour, $minute, $second, $timezone) = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; The following code is pointless: $year = $1; $month = $2; $mday = $3; $hour = $4; $minute = $5; $second = $6; $timezone = $7; --- Then you end the while loop! --- Disagree, The code is very relavent and allows the manipulation of the date, time and timezone using Date::Manip before it is written to the file. Everything that follows is working on the last line from the file. That makes no sense. --- Ignoring that, you code makes no sense anyway. Your code has statements like this: /^[A-Z]{3} /; These do absolutely nothing. --- Agreed. --- Try the following and see if it works. Post to the list if there are any problems with it. while (INFILE) { ($year, $month, $mday, $hour, $minute, $second, $timezone) = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; $year and last; # if we've matched the date line, then bail out. eof and print STDERR Didn't find date line; } print OUTFILE $month/$year...; --- Returns empty strings. # Now have date info and we're part way through file while (INFILE) { ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})( [A-Za-z]+)+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/; -- Truncates $cur_desc after first word. Looses the date values. # if we've matched a currency line, note that we've started: $cur_sym and $started++; # if we have not matched a currency line, and we've started, # well, then we've ended, and if we haven't started, we need # to move on to the next line: not $cur_sym and ($started and last) or next; # Now we have a matching line. Process variables as required. # One variable has one bit of space we don't want: substr($cur_desc, 0, 1) = ''; # get rid of leading space. }
Re: Quick Perl Question
($year, $month, $mday, $hour, $minute, $second, $timezone) = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; The following code is pointless: $year = $1; $month = $2; $mday = $3; $hour = $4; $minute = $5; $second = $6; $timezone = $7; --- Then you end the while loop! --- Disagree, The code is very relavent and allows the manipulation of the date, time and timezone using Date::Manip before it is written to the file. Well, the $year = $1 etc are definitely pointless. The statement immediately prior has just set the variables, so the $year = $1 simply overwrites their values with exactly the same values as were just assigned. Try: ($year, $month, $mday, $hour, $minute, $second, $timezone) = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; print $year; $year = $1; print $year; and you will see that $year doesn't change in between prints. The ending of the loop is ok, but it means you'v ended the loop. So the subsequent lines aren't in a loop, so they all work on whatever happened to be the last line in the input. Which is not what you want. As I said: Everything that follows is working on the last line from the file. That makes no sense. Try the following and see if it works. Post to the list if there are any problems with it. while (INFILE) { ($year, $month, $mday, $hour, $minute, $second, $timezone) = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; $year and last; # if we've matched the date line, then bail out. eof and print STDERR Didn't find date line; } print OUTFILE $month/$year...; --- Returns empty strings. I've taken another look, and I would expect it to either print the error or print the match. change the final print line to something like: print OUTFILE TEST: $month/$year...; and see if the 'TEST' appears. If it does, well, it's as if the INFILE loop isn't happening. # Now have date info and we're part way through file while (INFILE) { ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})( [A-Za-z]+)+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/; -- Truncates $cur_desc after first word. I've reflected briefly, and I've no idea on that. I can't see how it's possible. Looses the date values. Eh? What's this part got to do with the date values? How can setting these variables have anything to do with the other variables? I suspect I don't understand your terminology here.
Re: Quick Perl Question
currency.csv contains using the code below. The date has been adjusted from 2000-12-30 00:16:19 UTC to PST. The rest of the file is still not being processed. 2000-12-29,16:16:19,PST #!/usr/bin/perl # # cur2csv.pl # use strict; use vars qw($started); use vars qw($quote_date $cur_sym $cur_desc $usd_unit $units_usd); use vars qw($year $month $mday $hour $minute $second $timezone); use vars qw($conv_date $date $time $tz); use Date::Manip; use String::Strip; use DBI; use DBD::Pg; open (OUTFILE, , currency.csv) || die Can not open currency.csv for writing; printf STDERR Reading currency file...; open (INFILE, curtest) || die Can not open /var/spool/mail/currency for reading; while (INFILE) { # Extract date and time of Currency Rate Quotation ($year, $month, $mday, $hour, $minute, $second, $timezone) = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; # Convert date from UTC (GMT) to PST8PDT and adjust date and time accordingly. $tz = Date_TimeZone; $conv_date = $year-$month-$mday $hour:$minute:$second; $conv_date = ParseDate($conv_date); $conv_date = Date_ConvTZ($conv_date, $timezone, $tz); $date = UnixDate($conv_date,%Y-%m-%d); $time = UnixDate($conv_date,%H:%M:%S); $tz= UnixDate($conv_date,%Z); $year and last;# If we've matched the data line, then bail out. eof and print STDERR Didn't find the date line; } # Extract the ISO 4217 Code for Currencies and Funds (1995) # Extract the Currency Description, and trim the trailing spaces # Extract US Dollars to Units rate, and trim the leading/trailing spaces # Extract Units to US Dollars rate, and trim the leading/trailing spaces while (INFILE) { ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})( [A-Za-z])+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/; $cur_sym and $started++; not $cur_sym and ($started and last) or next; substr($cur_desc, 0, 1) = ''; # Delete leading space } printf OUTFILE %s\,%s\,%s\,%s\,%s\,%s\,%s\n, $date, $time, $tz, $cur_sym, $cur_desc, $usd_unit, $units_usd; close(INFILE); close(OUTFILE); print STDERR \n; 1;
Re: Quick Perl Question
printf OUTFILE %s\,%s\,%s\,%s\,%s\,%s\,%s\n, $date, $time, $tz, $cur_sym, $cur_desc, $usd_unit, $units_usd; close(INFILE); close(OUTFILE); print STDERR \n; 1; You seem to be misunderstanding one particular aspect of perl. Given the following: while (INFILE) { # do something with each line encountered # if condition, quit this while loop } # do something after first while loop while (INFILE) { # do something else with each line encountered # if another condition, quit this while loop } # do something after second while loop The first do something happens repeatedly until the first condition applies or the end of INFILE is reached, whichever comes first. In other words, it can happen zero times, or a dozen times, or whatever. The 'do something after first while loop' happens once. The second loop starts off, in terms of lines from the INPUT file, where the first left off. Other than that, a similar deal regarding the 'do something's applies to this second set of perl code as it does to the first set of perl code. In your case, the printf is outside the loops, so it will only happen once. - Other than that, given what printed, it's clear the second regex isn't matching at all. This isn't too surprising -- I didn't put a lot of effort in to it, so it might just be wrong. If it doesn't match, then the variables aren't set. To tighten the verification a little, add: $started or print STDERR Didn't find a currency line; after the second loop. This will surely print out, which shows that the regex didn't match. In other words: ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})( [A-Za-z])+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/; Doesn't match: USD United States Dollars 1.0 1.0 I can't see why not. One other possibility is that the second while loop isn't even being entered; stick a print TEST at the start of the second loop to verify that it is being entered.
Re: Quick Perl Question
The second loop is executing. The TEST statement worked. This will surely print out, which shows that the regex didn't match. In other words: ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})( [A-Za-z])+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/; Doesn't match: USD United States Dollars 1.0 1.0 I can't see why not. The Currency part of the email has a fixed format that is never deviated from: 1-3 $cur_sym 4 space 5-32$cur_desc 33-35 (3) spaces 36-55 d8.d10 (.00) 56-58 (3) spaces 59-78 d8.d10 (.00) I tried to shorten it to match the $cur_sym only but couldn't get it to work. I also tried the following: ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3}) ([A-Za-z]{28}) (\d{7}\.\d{10}) (\d{7}\.\d{10})\s*$/; Any suggestions? JAck
Re: Quick Perl Question
The second loop is executing. The TEST statement worked. Ok. The Currency part of the email has a fixed format that is never deviated from: 1-3 $cur_sym 4 space 5-32 $cur_desc 33-35 (3) spaces 36-55 d8.d10 (.00) 56-58 (3) spaces 59-78 d8.d10 (.00) This doesn't precisely match your example: USD United States Dollars 1.0 1.0 This includes numbers that are space padded in front and have just five zeroes after the decimal point (no doubt also space padded for the remaining decimal places). I tried to shorten it to match the $cur_sym only Good move. but couldn't get it to work Bummer. I also tried the following: ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3}) ([A-Za-z]{28}) (\d{7}\.\d{10}) (\d{7}\.\d{10})\s*$/; \d matches a digit. In your example, there are spaces where the above regex expects digits. Are you sure my original currency line regex isn't matching? If you are still doing the print-after-the-loop-ends thing, then the variables will naturally be empty, because the loop ends when the regex fails to match, hence setting all the variables to null just before it exits the loop. Going back to your attempt to get $cur_sym to match, try this as the first statements inside the second loop: ($cur_sym) = /^([A-Z]{3})/; print $cur_sym; does that do anything?
Re: Quick Perl Question
Got a combination that sort of works. It returns all the required fields but truncates any line where $usd_unit or $units_usd has more than 1 digit before the decimal point. There can be as many as (8) digits before and (10) digits after the decimal point in both cases. Here's the regex I'm using: ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})+\s+([A-Za-z\s]{28})+\s+(\d+\.\d+)+\s+(\d+\.\d+)/; As before I'm open to suggestion. Thanks again, Jack
Re: Quick Perl Question
Got a combination that sort of works. It returns all the required fields but truncates any line where $usd_unit or $units_usd has more than 1 digit before the decimal point. There can be as many as (8) digits before and (10) digits after the decimal point in both cases. Here's the regex I'm using: ($cur_sym, $cur_desc, $usd_unit, $units_usd) = /^([A-Z]{3})+\s+([A-Za-z\s]{28})+\s+(\d+\.\d+)+\s+(\d+\.\d+)/; /^ matches start of line. Ok. [A-Z]{3} matches 3 uppercase letters. Ok. ([A-Z]{3})+ matches 3, 6, 9, ... uppercase letters and puts the last set of 3 in to $cur_sym. Probably not what you meant. You should stick to what we had before: ([A-Z]{3}) with a space following as the next matching character of the pattern. \s+ matches one or more spaces. Ok. ([A-Za-z\s]{28}) matches the next 28 alpha or whitespace characters. (whitespace means spaces or tabs or newlines.) Ok. + matches the previous 28 character atom 1 or more times, and returns the last 28 character match as the second variable ($cur_desc). Not what you want. Remove this extraneous +. \s+ matches one or more whitespace characters. ok. (\d+\.\d+) matches one or more digits, followed by one space, followed by one or more digits. ok. matches the previous atom 1 or more times. Again, not what you want. Remove the extraneous +. \s+ matches one or more whitespace characters. ok. (\d+\.\d+) matches one or more digits, followed by one space, followed by one or more digits. ok. /; means that anything can follow the rest of the pattern. I'd recommend tightening the pattern up by making the end be: \s+$/; which matches any amount of whitespace and then the end of the line. Did you spot your mistake? I didn't, but I'll let you tidy up your regex first and see if you don't spot your problem.
Re: Quick Perl Question
1. I want to read in a text file and match any line that begins with three capital letters followed by a space. i.e. USD while () { /^[A-Z]{3} / and dostuff; # $_ contains line } 2. I need to ignore any blank lines, lines containing all ---, lines containing all ===. while () { /^(\s|-|=)*$/ and next; /^[A-Z]{3} / and dostuff; # $_ contains line }
Re: Quick Perl Question
I forgot to explain. 1. I want to read in a text file and match any line that begins with three capital letters followed by a space. i.e. USD while () { will read from the file(s) you specify on the command line when you run your perl script, ie perl myscript.pl inputfile or just myscript.pl inputfile The magical incantation: while () { reads through the input file(s) a line at a time, putting the line in $_, which is a special 'default' variable that is assumed by lots of other perl functions. /^[A-Z]{3} / and dostuff; # $_ contains line Use /stuff/ to match something in $_. In a // expression, ^ matches at the start of a line (sorta). ($ matches at the end of a line.) [abc] is a // expression atom that matches a, b, or c. [a-c] would do the same job. Following some atom with {min,max} tells perl how many times to match. The word 'and' means, if the thing on the left is true, then also do the thing on the right. dostuff was a made up name of a sub procedure that you would have to declare elsewhere like this: sub dostuff { # code } 2. I need to ignore any blank lines, lines containing all ---, lines containing all ===. while () { /^(\s|-|=)*$/ and next; I got this wrong. The basic principle was to use () brackets to turn the content into a // expression atom, then use * after that to tell perl how many times to match the atom. This just like {min,max}. * is shorthand for {0,infinity}. The \s means match any whitespace character. If a bunch of things in a //, or in a () enclosed atom, are separated by | symbols, then perl can match any of the separated things. I got it wrong because what I wrote will match, say: -= -= -= -= -= -= -= - =- = - -= or other combinations. This would more accurately fit what you asked for: /^(\s*|-*|=*)$/ and next; The word 'next' means to go on to the next iteration in the loop code containing the 'next' command.
RE: quick PERL question
M.W. Koskamp writes .. The special variable $| sets the autoflush. See PERLVAR documentation. Whats this person does is a dirty way of setting $| to a true value (not 0 or undef). Default = 0. why do you say 'dirty' ? .. do you just mean 'less readable' ? .. or are you implying some other problem with $|++ ? -- jason king No children may attend school with their breath smelling of wild onions in West Virginia. - http://dumblaws.com/