Re: Quick Perl Question

2001-06-19 Thread Nigel Wetters

1.

$filename = 'foo.txt';
open(FH,$filename) or die couldn't open $filename - $!;
while ($line = FH){
print $line matches\n if ($line =~ /^USD /);
}

2.

while ($line=FH){
chomp $line;
next unless $line;
next if ($line =~ /^-+?$/);
next if ($line =~ /^=+?$/);
# only good lines get this far
}

 Jack Lauman [EMAIL PROTECTED] 06/19/01 05:05am 
1. I want to read in a text file and match any line that begins with
three capital letters followed by a space.  i.e. USD 

How do you do that?

2. I need to ignore any blank lines, lines containing all ---, lines
containing all ===.

Again, how?

Thanks in advance,

Jack



This e-mail and any files transmitted with it are confidential 
and solely for the use of the intended recipient. 
ONdigital plc, 346 Queenstown Road, London SW8 4DG. Reg No: 3302715. 



Re: Quick Perl Question

2001-06-19 Thread Jack Lauman



Me wrote:
 
 Analysis of the code you attached.
 
 $quote_date = substr($_,0,79);
 
 The above line is pointless.
 
--- Agreed.


 The next couple lines are great:
 
 ($year, $month, $mday, $hour, $minute, $second, $timezone) =
 $quote_date = /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+)
 (\w+) (.*)$/;
 
 Although as I indicated, you could omit $quote_date and just do:
 
 ($year, $month, $mday, $hour, $minute, $second, $timezone) =
 /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/;
 
 The following code is pointless:
 
 $year   = $1;
 $month  = $2;
 $mday   = $3;
 $hour   = $4;
 $minute = $5;
 $second = $6;
 $timezone   = $7;
 
 ---
 
 Then you end the while loop!

--- Disagree, The code is very relavent and allows the manipulation of
the date, time and timezone using Date::Manip before it is written to
the file.

 
 Everything that follows is working on the last line
 from the file. That makes no sense.
 
 ---
 
 Ignoring that, you code makes no sense anyway.
 
 Your code has statements like this:
 
  /^[A-Z]{3} /;
 
 These do absolutely nothing.
 

--- Agreed.
 ---
 
 Try the following and see if it works. Post to the list if there are any
 problems with it.
 
 while (INFILE) {
 
 ($year, $month, $mday, $hour, $minute, $second, $timezone) =
 /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/;
 
 $year and last; # if we've matched the date line, then bail out.
 
 eof and print STDERR Didn't find date line;
 }
 
 print OUTFILE $month/$year...;
 

--- Returns empty strings.

 # Now have date info and we're part way through file
 
 while (INFILE) {
 
 ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
 /^([A-Z]{3})( [A-Za-z]+)+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/;
 

-- Truncates $cur_desc after first word.  Looses the date values.

 # if we've matched a currency line, note that we've started:
 $cur_sym and $started++;
 
 # if we have not matched a currency line, and we've started,
 # well, then we've ended, and if we haven't started, we need
 # to move on to the next line:
 not $cur_sym and ($started and last) or next;
 
 # Now we have a matching line. Process variables as required.
 
 # One variable has one bit of space we don't want:
 substr($cur_desc, 0, 1) = ''; # get rid of leading space.
 }



Re: Quick Perl Question

2001-06-19 Thread Me

  ($year, $month, $mday, $hour, $minute, $second, $timezone) =
  /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/;
 
  The following code is pointless:
 
  $year   = $1;
  $month  = $2;
  $mday   = $3;
  $hour   = $4;
  $minute = $5;
  $second = $6;
  $timezone   = $7;
 
  ---
 
  Then you end the while loop!

 --- Disagree, The code is very relavent and allows the manipulation
of
 the date, time and timezone using Date::Manip before it is written to
 the file.

Well, the $year = $1 etc are definitely pointless.
The statement immediately prior has just set
the variables, so the $year = $1 simply overwrites
their values with exactly the same values as were
just assigned. Try:

($year, $month, $mday, $hour, $minute, $second, $timezone) =
/^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/;

print $year;

$year   = $1;

print $year;

and you will see that $year doesn't change in
between prints.

The ending of the loop is ok, but it means you'v
ended the loop. So the subsequent lines aren't
in a loop, so they all work on whatever happened
to be the last line in the input. Which is not what
you want. As I said:

  Everything that follows is working on the last line
  from the file. That makes no sense.



  Try the following and see if it works. Post to the list if there are
any
  problems with it.
 
  while (INFILE) {
 
  ($year, $month, $mday, $hour, $minute, $second, $timezone) =
  /^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+)
(.*)$/;
 
  $year and last; # if we've matched the date line, then bail
out.
 
  eof and print STDERR Didn't find date line;
  }
 
  print OUTFILE $month/$year...;
 

 --- Returns empty strings.

I've taken another look, and I would expect it to either
print the error or print the match.

change the final print line to something like:

print OUTFILE TEST: $month/$year...;

and see if the 'TEST' appears.

If it does, well, it's as if the INFILE loop isn't happening.

  # Now have date info and we're part way through file
 
  while (INFILE) {
 
  ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
  /^([A-Z]{3})( [A-Za-z]+)+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/;
 

 -- Truncates $cur_desc after first word.

I've reflected briefly, and I've no idea on that.
I can't see how it's possible.


  Looses the date values.

Eh? What's this part got to do with the date values?
How can setting these variables have anything to do
with the other variables? I suspect I don't understand
your terminology here.




Re: Quick Perl Question

2001-06-19 Thread Jack Lauman

currency.csv contains using the code below.  The date has been
adjusted from 2000-12-30 00:16:19 UTC to PST.  The rest of the file is
still not being processed.

2000-12-29,16:16:19,PST


#!/usr/bin/perl
#
# cur2csv.pl
#

use strict;
use vars qw($started);
use vars qw($quote_date $cur_sym $cur_desc $usd_unit $units_usd);
use vars qw($year $month $mday $hour $minute $second $timezone);
use vars qw($conv_date $date $time $tz);


use Date::Manip;
use String::Strip;

use DBI;
use DBD::Pg;

open (OUTFILE, , currency.csv) || die Can not open currency.csv
for writing;

printf STDERR Reading currency file...;
open (INFILE, curtest) || die Can not open /var/spool/mail/currency
for reading;

while (INFILE) {


# Extract date and time of Currency Rate Quotation

($year, $month, $mday, $hour, $minute, $second, $timezone) =
/^Rates as of (\d+).(\d+).(\d+) (\d+):(\d+):(\d+) (\w+) (.*)$/; 

# Convert date from UTC (GMT) to PST8PDT and adjust date and time
accordingly.

$tz = Date_TimeZone;   
$conv_date = $year-$month-$mday $hour:$minute:$second;
$conv_date = ParseDate($conv_date);
$conv_date = Date_ConvTZ($conv_date, $timezone, $tz);  
$date  = UnixDate($conv_date,%Y-%m-%d);
$time  = UnixDate($conv_date,%H:%M:%S);
$tz= UnixDate($conv_date,%Z);

$year and last;# If we've matched the data line, then bail out.

eof and print STDERR Didn't find the date line;

}

# Extract the ISO 4217 Code for Currencies and Funds (1995)
# Extract the Currency Description, and trim the trailing spaces
# Extract US Dollars to Units rate, and trim the leading/trailing
spaces
# Extract Units to US Dollars rate, and trim the leading/trailing
spaces

while (INFILE) {

($cur_sym, $cur_desc, $usd_unit, $units_usd) =
/^([A-Z]{3})( [A-Za-z])+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/;

$cur_sym and $started++;

not $cur_sym and ($started and last) or next;

substr($cur_desc, 0, 1) = '';  # Delete leading space

}   


printf OUTFILE %s\,%s\,%s\,%s\,%s\,%s\,%s\n,
$date, $time, $tz, $cur_sym, $cur_desc, $usd_unit, $units_usd;

close(INFILE);
close(OUTFILE);
print STDERR \n;

1;



Re: Quick Perl Question

2001-06-19 Thread Me

 printf OUTFILE %s\,%s\,%s\,%s\,%s\,%s\,%s\n,
 $date, $time, $tz, $cur_sym, $cur_desc, $usd_unit, $units_usd;

 close(INFILE);
 close(OUTFILE);
 print STDERR \n;

 1;

You seem to be misunderstanding one particular
aspect of perl. Given the following:

while (INFILE) {

# do something with each line encountered
# if condition, quit this while loop

}

# do something after first while loop

while (INFILE) {

# do something else with each line encountered
# if another condition, quit this while loop

}

# do something after second while loop

The first do something happens repeatedly until
the first condition applies or the end of INFILE
is reached, whichever comes first. In other words,
it can happen zero times, or a dozen times, or
whatever.

The 'do something after first while loop' happens once.

The second loop starts off, in terms of lines from the
INPUT file, where the first left off. Other than that, a
similar deal regarding the 'do something's applies
to this second set of perl code as it does to the first
set of perl code.

In your case, the printf is outside the loops, so it will
only happen once.

-

Other than that, given what printed, it's clear the second
regex isn't matching at all. This isn't too surprising -- I
didn't put a lot of effort in to it, so it might just be wrong.
If it doesn't match, then the variables aren't set.

To tighten the verification a little, add:

$started or print STDERR Didn't find a currency line;

after the second loop.

This will surely print out, which shows that the regex
didn't match. In other words:

($cur_sym, $cur_desc, $usd_unit, $units_usd) =
/^([A-Z]{3})( [A-Za-z])+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/;

Doesn't match:

USD United States Dollars 1.0
1.0

I can't see why not.

One other possibility is that the second while loop
isn't even being entered; stick a print TEST at the
start of the second loop to verify that it is being
entered.




Re: Quick Perl Question

2001-06-19 Thread Jack Lauman

The second loop is executing.  The TEST statement worked.

 This will surely print out, which shows that the regex
 didn't match. In other words:
 
 ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
 /^([A-Z]{3})( [A-Za-z])+\s+(\d+\.\d+)\s+(\d+\.\d+)\s*$/;
 
 Doesn't match:
 
 USD United States Dollars 1.0
 1.0
 
 I can't see why not.
 

The Currency part of the email has a fixed format that is never
deviated from:

1-3 $cur_sym
4   space
5-32$cur_desc
33-35   (3) spaces
36-55   d8.d10 (.00)
56-58   (3) spaces
59-78   d8.d10 (.00)

I tried to shorten it to match the $cur_sym only but couldn't get it
to work.  I also tried the following:

($cur_sym, $cur_desc, $usd_unit, $units_usd) =
/^([A-Z]{3}) ([A-Za-z]{28})   (\d{7}\.\d{10})   (\d{7}\.\d{10})\s*$/;

Any suggestions?

JAck



Re: Quick Perl Question

2001-06-19 Thread Me

 The second loop is executing.  The TEST statement worked.

Ok.

 The Currency part of the email has a fixed format that is never
 deviated from:
 
 1-3 $cur_sym
 4 space
 5-32 $cur_desc
 33-35 (3) spaces
 36-55 d8.d10 (.00)
 56-58 (3) spaces
 59-78 d8.d10 (.00)

This doesn't precisely match your example:

  USD United States Dollars 1.0
  1.0

This includes numbers that are space padded in front
and have just five zeroes after the decimal point (no
doubt also space padded for the remaining decimal
places).

 I tried to shorten it to match the $cur_sym only

Good move.

 but couldn't get it to work

Bummer.

 I also tried the following: 
 ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
 /^([A-Z]{3}) ([A-Za-z]{28})   (\d{7}\.\d{10})   (\d{7}\.\d{10})\s*$/;

\d matches a digit. In your example, there are spaces
where the above regex expects digits.

Are you sure my original currency line regex isn't matching?

If you are still doing the print-after-the-loop-ends
thing, then the variables will naturally be empty,
because the loop ends when the regex fails
to match, hence setting all the variables to null
just before it exits the loop.

Going back to your attempt to get $cur_sym to 
match, try this as the first statements inside the
second loop:

($cur_sym) = /^([A-Z]{3})/;
print $cur_sym;

does that do anything?




Re: Quick Perl Question

2001-06-19 Thread Jack Lauman

Got a combination that sort of works.  It returns all the required
fields but truncates any line where $usd_unit or $units_usd has more
than 1 digit before the decimal point.  There can be as many as (8)
digits before and (10) digits after the decimal point in both cases.

Here's the regex I'm using:

($cur_sym, $cur_desc, $usd_unit, $units_usd) =
/^([A-Z]{3})+\s+([A-Za-z\s]{28})+\s+(\d+\.\d+)+\s+(\d+\.\d+)/;

As before I'm open to suggestion.

Thanks again,

Jack



Re: Quick Perl Question

2001-06-19 Thread Me

 Got a combination that sort of works.  It returns all the required
 fields but truncates any line where $usd_unit or $units_usd has more
 than 1 digit before the decimal point.  There can be as many as (8)
 digits before and (10) digits after the decimal point in both cases.
 
 Here's the regex I'm using:
 
 ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
 /^([A-Z]{3})+\s+([A-Za-z\s]{28})+\s+(\d+\.\d+)+\s+(\d+\.\d+)/;

/^

matches start of line. Ok.

[A-Z]{3}

matches 3 uppercase letters. Ok. 

([A-Z]{3})+

matches 3, 6, 9, ... uppercase letters and puts the
last set of 3 in to $cur_sym. Probably not what you
meant. You should stick to what we had before:

([A-Z]{3})

with a space following as the next matching character
of the pattern.

\s+

matches one or more spaces. Ok.

([A-Za-z\s]{28})

matches the next 28 alpha or whitespace characters.
(whitespace means spaces or tabs or newlines.) Ok.

+

matches the previous 28 character atom 1 or more
times, and returns the last 28 character match as the
second variable ($cur_desc). Not what you want.
Remove this extraneous +.

\s+

matches one or more whitespace characters. ok.

(\d+\.\d+)

matches one or more digits, followed by one space,
followed by one or more digits. ok.

matches the previous atom 1 or more times. Again,
not what you want. Remove the extraneous +.

\s+

matches one or more whitespace characters. ok.

(\d+\.\d+)

matches one or more digits, followed by one space,
followed by one or more digits. ok.

/;

means that anything can follow the rest of the pattern.

I'd recommend tightening the pattern up by making the
end be:

\s+$/;

which matches any amount of whitespace and then the
end of the line.

Did you spot your mistake? I didn't, but I'll let you tidy
up your regex first and see if you don't spot your problem.




Re: Quick Perl Question

2001-06-18 Thread Me

 1. I want to read in a text file and match any line that begins with
 three capital letters followed by a space.  i.e. USD 

while () {

/^[A-Z]{3} / and dostuff; # $_ contains line

}


 2. I need to ignore any blank lines, lines containing all ---, lines
 containing all ===.

while () {

/^(\s|-|=)*$/ and next;
/^[A-Z]{3} / and dostuff; # $_ contains line

}






Re: Quick Perl Question

2001-06-18 Thread Me

I forgot to explain.

  1. I want to read in a text file and match any line that begins with
  three capital letters followed by a space.  i.e. USD 

 while () {

 will read from the file(s) you specify on the command
line when you run your perl script, ie

perl myscript.pl inputfile

or just

myscript.pl inputfile

The magical incantation:

while () {

reads through the input file(s) a line at a time,
putting the line in $_, which is a special 'default'
variable that is assumed by lots of other perl
functions.

 /^[A-Z]{3} / and dostuff; # $_ contains line

Use

/stuff/

to match something in $_.

In a // expression, ^ matches at the start of a line (sorta).
($ matches at the end of a line.)

[abc] is a // expression atom that matches a, b, or c.
[a-c] would do the same job.

Following some atom with {min,max} tells perl how many
times to match.

The word 'and' means, if the thing on the left is true,
then also do the thing on the right.

dostuff was a made up name of a sub procedure that
you would have to declare elsewhere like this:

sub dostuff {

# code

}

  2. I need to ignore any blank lines, lines containing all ---,
lines
  containing all ===.

 while () {

 /^(\s|-|=)*$/ and next;

I got this wrong.

The basic principle was to use () brackets to turn the content
into a // expression atom, then use * after that to tell perl how
many times to match the atom. This just like {min,max}. * is
shorthand for {0,infinity}.

The \s means match any whitespace character. If a bunch of
things in a //, or in a () enclosed atom, are separated by |
symbols, then perl can match any of the separated things.

I got it wrong because what I wrote will match, say:

 -= -= -= -= -= -= -= - =- =  -  -=

or other combinations.

This would more accurately fit what you asked for:

/^(\s*|-*|=*)$/ and next;

The word 'next' means to go on to the next iteration
in the loop code containing the 'next' command.




RE: quick PERL question

2001-04-23 Thread King, Jason

M.W. Koskamp writes ..

The special  variable $| sets the autoflush. See PERLVAR documentation.
Whats this person does is a dirty way of setting $| to a true 
value (not 0 or undef).
Default = 0.

why do you say 'dirty' ? .. do you just mean 'less readable' ? .. or are you
implying some other problem with $|++ ?

-- 
  jason king

  No children may attend school with their breath smelling of wild
  onions in West Virginia. - http://dumblaws.com/