Struggling with my first XML::Parser project

2003-11-09 Thread Gary Nielson
I have partially written a program to parse National Weather Service alerts.
But I am struggling to figure out how to parse the file so that if there an
urgent alert, say, for a specific geocode or area, I can grab the headline
and description for that geocode.

I am working off the examples in the XML::Parser perldoc but don't
understand how startElement, endElement and characterData work together. Let
me show you my code and an example from the file I am trying to parse. I
have included a note in the code where I am getting stuck. I am looking for
advice and explanations.

Code:

use strict;
use XML::Parser;
use diagnostics;
use vars qw(@array $data $xmlfile $count $tag $element $infile);

my $xmlfile = us.xml;
die Cannot find file \$xmlfile\
   unless -f $xmlfile;

$count = 0;
$tag = ;
$element = ;

# retrieve complete file and fix errors in the file
open (IN, $xmlfile) || die(Error Reading File: $xmlfile $!);
{
undef $/;
$infile = IN;
}
close (IN) || die(Error Closing File: $xmlfile $!);

$infile =~ s//amp;/gm;

# write complete file
open (PROD, $xmlfile) || die(Error Writing to File: $xmlfile $!);
print PROD $infile;
close (PROD) || die(Error Closing File: $xmlfile $!);

my $parser = new XML::Parser;

$parser-setHandlers( Start = \startElement,
   End = \endElement,
   Char = \characterData,
   Default = \default);

$parser-parsefile($xmlfile);

sub startElement {
   my( $parseinst, $element, %attrs ) = @_;
 #print start element: $element\n;
}

sub endElement {
   my( $parseinst, $element ) = @_;
# i am doing nothing with this
  }

sub characterData {
  my( $parseinst, $data ) = @_;
print element: $element\n;
print data: $data\n;
push @array, $data;

}

NOTE: I can print the data and put it in an array, but it's useless because
each line of the array is  just the data for that element. I don't know how
to get all of the data for each element under cap:info onto one line (ie.
data for category|event|urgency|severity... etc.( so I can split it and pull
out
what I want. I don't even know if this is the best approach because it seems
I could do a parser myself without using XML::Parser; it doesn't seem to be
an easier this way.

Also I see that cap:area includes its own level in the hierarchy, areaDesc
and geocode, and don't know how that works, though when I print $data I get
that information.


sub default {
   my( $parseinst, $data ) = @_;
}

Example from file at http://www.nws.noaa.gov/alerts/us.cap.

- cap:info
  cap:categoryMet/cap:category
  cap:eventNON PRECIPITATION STATEMENT/cap:event
  cap:urgencyUnknown/cap:urgency
  cap:severityUnknown/cap:severity
  cap:certaintyUnknown/cap:certainty
  cap:effective2003-11-09T09:21:00/cap:effective
  cap:expires2003-11-09T21:00:00/cap:expires
  cap:headlineNON PRECIPITATION STATEMENT/cap:headline
  cap:descriptionURGENT - WEATHER MESSAGE NATIONAL WEATHER SERVICE
MORRISTOWN TN 400 AM EST SUN NOV 9 2003 ...FIRST HARD FREEZE OF THE FALL
SEASON POSSIBLE ACROSS NORTHEAST TENNESSEE...NORTHERN PLATEAU...AND PARTS OF
THE SMOKY MOUNTAINS SUNDAY NIGHT... .A RIDGE OF HIGH PRESSURE WILL CONTINUE
TO BUILD INTO THE SOUTHERN APPALACHIANS TODAY ALLOWING WINDS TO BECOME LIGHT
LATER TONIGHT. TEMPERATURES WILL BE ABLE TO DROP INTO THE UPPER 20S TO LOWER
30S BY DAYBREAK MONDAY ACROSS MUCH OF NORTHEAST TENNESSEE...SOUTHWEST
VIRGINIA AND NORTHERN SECTIONS OF THE CUMBERLAND PLATEAU AS WELL AS THE
SMOKY MOUNTAINS. STAY TUNED TO NOAA WEATHER RADIO AND OTHER LOCAL MEDIA FOR
FURTHER DETAILS OR UPDATES. TNZ012017-035-041-042-044-046-072-074-092100-
BLOUNT SMOKY MTN-CAMPBELL-CLAIBORNE-COCKE SMOKY MTN-HANCOCK-HAWKINS-
MORGAN-NORTHWEST CARTER-NORTHWEST GREENE-SCOTT TN-SEVIER SMOKY MTN-
SULLIVAN-WASHINGTON TN- INCLUDING THE CITIES
OF...BRISTOL...COSBY...ELIZABETHTON...
GATLINBURG...GREENEVILLE...JACKSBORO...JOHNSON CITY...ONEIDA...
ROGERSVILLE...SNEEDVILLE...TAZEWELL...TOWNSEND...WARTBURG 400 AM EST SUN NOV
9 2003 ...FREEZE WATCH IS IN EFFECT FOR TONIGHT... LIGHT WINDS AND ONLY
PARTLY CLOUDY SKIES WILL ALLOW TEMPERATURES TO DROP INTO THE UPPER 20S TO
LOWER 30S BY DAYBREAK MONDAY. TEMPERATURES MAY DROP AT OR BELOW FREEZING FOR
1 TO 3 HOURS TONIGHT. ANY OUTSIDE PLANTS SUSCEPTIBLE TO FREEZE DAMAGE SHOULD
BE BROUGHT INDOORS OR COVERED WITH MULCH OR PLASTIC. $$/cap:description

cap:webhttp://www.nws.noaa.gov/alerts/us.html#TNZ017.MRXNPWMRX.092100/cap
:web
- cap:area
  cap:areaDescSullivan (Tennessee)/cap:areaDesc
  cap:geocode047163/cap:geocode
  /cap:area
  /cap:info


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: 

RE: Struggling with my first XML::Parser project

2003-11-09 Thread Burak Gürsoy
try XML::Simple = http://search.cpan.org/~grantm/XML-Simple/


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Gary
Nielson
Sent: Sunday, November 09, 2003 9:35 PM
To: [EMAIL PROTECTED]
Subject: Struggling with my first XML::Parser project


I have partially written a program to parse National Weather Service alerts.
But I am struggling to figure out how to parse the file so that if there an
urgent alert, say, for a specific geocode or area, I can grab the headline
and description for that geocode.

I am working off the examples in the XML::Parser perldoc but don't
understand how startElement, endElement and characterData work together. Let
me show you my code and an example from the file I am trying to parse. I
have included a note in the code where I am getting stuck. I am looking for
advice and explanations.

Code:

use strict;
use XML::Parser;
use diagnostics;
use vars qw(@array $data $xmlfile $count $tag $element $infile);

my $xmlfile = us.xml;
die Cannot find file \$xmlfile\
   unless -f $xmlfile;

$count = 0;
$tag = ;
$element = ;

# retrieve complete file and fix errors in the file
open (IN, $xmlfile) || die(Error Reading File: $xmlfile $!);
{
undef $/;
$infile = IN;
}
close (IN) || die(Error Closing File: $xmlfile $!);

$infile =~ s//amp;/gm;

# write complete file
open (PROD, $xmlfile) || die(Error Writing to File: $xmlfile $!);
print PROD $infile;
close (PROD) || die(Error Closing File: $xmlfile $!);

my $parser = new XML::Parser;

$parser-setHandlers( Start = \startElement,
   End = \endElement,
   Char = \characterData,
   Default = \default);

$parser-parsefile($xmlfile);

sub startElement {
   my( $parseinst, $element, %attrs ) = @_;
 #print start element: $element\n;
}

sub endElement {
   my( $parseinst, $element ) = @_;
# i am doing nothing with this
  }

sub characterData {
  my( $parseinst, $data ) = @_;
print element: $element\n;
print data: $data\n;
push @array, $data;

}

NOTE: I can print the data and put it in an array, but it's useless because
each line of the array is  just the data for that element. I don't know how
to get all of the data for each element under cap:info onto one line (ie.
data for category|event|urgency|severity... etc.( so I can split it and pull
out
what I want. I don't even know if this is the best approach because it seems
I could do a parser myself without using XML::Parser; it doesn't seem to be
an easier this way.

Also I see that cap:area includes its own level in the hierarchy, areaDesc
and geocode, and don't know how that works, though when I print $data I get
that information.


sub default {
   my( $parseinst, $data ) = @_;
}

Example from file at http://www.nws.noaa.gov/alerts/us.cap.

- cap:info
  cap:categoryMet/cap:category
  cap:eventNON PRECIPITATION STATEMENT/cap:event
  cap:urgencyUnknown/cap:urgency
  cap:severityUnknown/cap:severity
  cap:certaintyUnknown/cap:certainty
  cap:effective2003-11-09T09:21:00/cap:effective
  cap:expires2003-11-09T21:00:00/cap:expires
  cap:headlineNON PRECIPITATION STATEMENT/cap:headline
  cap:descriptionURGENT - WEATHER MESSAGE NATIONAL WEATHER SERVICE
MORRISTOWN TN 400 AM EST SUN NOV 9 2003 ...FIRST HARD FREEZE OF THE FALL
SEASON POSSIBLE ACROSS NORTHEAST TENNESSEE...NORTHERN PLATEAU...AND PARTS OF
THE SMOKY MOUNTAINS SUNDAY NIGHT... .A RIDGE OF HIGH PRESSURE WILL CONTINUE
TO BUILD INTO THE SOUTHERN APPALACHIANS TODAY ALLOWING WINDS TO BECOME LIGHT
LATER TONIGHT. TEMPERATURES WILL BE ABLE TO DROP INTO THE UPPER 20S TO LOWER
30S BY DAYBREAK MONDAY ACROSS MUCH OF NORTHEAST TENNESSEE...SOUTHWEST
VIRGINIA AND NORTHERN SECTIONS OF THE CUMBERLAND PLATEAU AS WELL AS THE
SMOKY MOUNTAINS. STAY TUNED TO NOAA WEATHER RADIO AND OTHER LOCAL MEDIA FOR
FURTHER DETAILS OR UPDATES. TNZ012017-035-041-042-044-046-072-074-092100-
BLOUNT SMOKY MTN-CAMPBELL-CLAIBORNE-COCKE SMOKY MTN-HANCOCK-HAWKINS-
MORGAN-NORTHWEST CARTER-NORTHWEST GREENE-SCOTT TN-SEVIER SMOKY MTN-
SULLIVAN-WASHINGTON TN- INCLUDING THE CITIES
OF...BRISTOL...COSBY...ELIZABETHTON...
GATLINBURG...GREENEVILLE...JACKSBORO...JOHNSON CITY...ONEIDA...
ROGERSVILLE...SNEEDVILLE...TAZEWELL...TOWNSEND...WARTBURG 400 AM EST SUN NOV
9 2003 ...FREEZE WATCH IS IN EFFECT FOR TONIGHT... LIGHT WINDS AND ONLY
PARTLY CLOUDY SKIES WILL ALLOW TEMPERATURES TO DROP INTO THE UPPER 20S TO
LOWER 30S BY DAYBREAK MONDAY. TEMPERATURES MAY DROP AT OR BELOW FREEZING FOR
1 TO 3 HOURS TONIGHT. ANY OUTSIDE PLANTS SUSCEPTIBLE TO FREEZE DAMAGE SHOULD
BE BROUGHT INDOORS OR COVERED WITH MULCH OR PLASTIC. $$/cap:description

cap:webhttp://www.nws.noaa.gov/alerts/us.html#TNZ017