Struggling with my first XML::Parser project
I have partially written a program to parse National Weather Service alerts. But I am struggling to figure out how to parse the file so that if there an urgent alert, say, for a specific geocode or area, I can grab the headline and description for that geocode. I am working off the examples in the XML::Parser perldoc but don't understand how startElement, endElement and characterData work together. Let me show you my code and an example from the file I am trying to parse. I have included a note in the code where I am getting stuck. I am looking for advice and explanations. Code: use strict; use XML::Parser; use diagnostics; use vars qw(@array $data $xmlfile $count $tag $element $infile); my $xmlfile = us.xml; die Cannot find file \$xmlfile\ unless -f $xmlfile; $count = 0; $tag = ; $element = ; # retrieve complete file and fix errors in the file open (IN, $xmlfile) || die(Error Reading File: $xmlfile $!); { undef $/; $infile = IN; } close (IN) || die(Error Closing File: $xmlfile $!); $infile =~ s//amp;/gm; # write complete file open (PROD, $xmlfile) || die(Error Writing to File: $xmlfile $!); print PROD $infile; close (PROD) || die(Error Closing File: $xmlfile $!); my $parser = new XML::Parser; $parser-setHandlers( Start = \startElement, End = \endElement, Char = \characterData, Default = \default); $parser-parsefile($xmlfile); sub startElement { my( $parseinst, $element, %attrs ) = @_; #print start element: $element\n; } sub endElement { my( $parseinst, $element ) = @_; # i am doing nothing with this } sub characterData { my( $parseinst, $data ) = @_; print element: $element\n; print data: $data\n; push @array, $data; } NOTE: I can print the data and put it in an array, but it's useless because each line of the array is just the data for that element. I don't know how to get all of the data for each element under cap:info onto one line (ie. data for category|event|urgency|severity... etc.( so I can split it and pull out what I want. I don't even know if this is the best approach because it seems I could do a parser myself without using XML::Parser; it doesn't seem to be an easier this way. Also I see that cap:area includes its own level in the hierarchy, areaDesc and geocode, and don't know how that works, though when I print $data I get that information. sub default { my( $parseinst, $data ) = @_; } Example from file at http://www.nws.noaa.gov/alerts/us.cap. - cap:info cap:categoryMet/cap:category cap:eventNON PRECIPITATION STATEMENT/cap:event cap:urgencyUnknown/cap:urgency cap:severityUnknown/cap:severity cap:certaintyUnknown/cap:certainty cap:effective2003-11-09T09:21:00/cap:effective cap:expires2003-11-09T21:00:00/cap:expires cap:headlineNON PRECIPITATION STATEMENT/cap:headline cap:descriptionURGENT - WEATHER MESSAGE NATIONAL WEATHER SERVICE MORRISTOWN TN 400 AM EST SUN NOV 9 2003 ...FIRST HARD FREEZE OF THE FALL SEASON POSSIBLE ACROSS NORTHEAST TENNESSEE...NORTHERN PLATEAU...AND PARTS OF THE SMOKY MOUNTAINS SUNDAY NIGHT... .A RIDGE OF HIGH PRESSURE WILL CONTINUE TO BUILD INTO THE SOUTHERN APPALACHIANS TODAY ALLOWING WINDS TO BECOME LIGHT LATER TONIGHT. TEMPERATURES WILL BE ABLE TO DROP INTO THE UPPER 20S TO LOWER 30S BY DAYBREAK MONDAY ACROSS MUCH OF NORTHEAST TENNESSEE...SOUTHWEST VIRGINIA AND NORTHERN SECTIONS OF THE CUMBERLAND PLATEAU AS WELL AS THE SMOKY MOUNTAINS. STAY TUNED TO NOAA WEATHER RADIO AND OTHER LOCAL MEDIA FOR FURTHER DETAILS OR UPDATES. TNZ012017-035-041-042-044-046-072-074-092100- BLOUNT SMOKY MTN-CAMPBELL-CLAIBORNE-COCKE SMOKY MTN-HANCOCK-HAWKINS- MORGAN-NORTHWEST CARTER-NORTHWEST GREENE-SCOTT TN-SEVIER SMOKY MTN- SULLIVAN-WASHINGTON TN- INCLUDING THE CITIES OF...BRISTOL...COSBY...ELIZABETHTON... GATLINBURG...GREENEVILLE...JACKSBORO...JOHNSON CITY...ONEIDA... ROGERSVILLE...SNEEDVILLE...TAZEWELL...TOWNSEND...WARTBURG 400 AM EST SUN NOV 9 2003 ...FREEZE WATCH IS IN EFFECT FOR TONIGHT... LIGHT WINDS AND ONLY PARTLY CLOUDY SKIES WILL ALLOW TEMPERATURES TO DROP INTO THE UPPER 20S TO LOWER 30S BY DAYBREAK MONDAY. TEMPERATURES MAY DROP AT OR BELOW FREEZING FOR 1 TO 3 HOURS TONIGHT. ANY OUTSIDE PLANTS SUSCEPTIBLE TO FREEZE DAMAGE SHOULD BE BROUGHT INDOORS OR COVERED WITH MULCH OR PLASTIC. $$/cap:description cap:webhttp://www.nws.noaa.gov/alerts/us.html#TNZ017.MRXNPWMRX.092100/cap :web - cap:area cap:areaDescSullivan (Tennessee)/cap:areaDesc cap:geocode047163/cap:geocode /cap:area /cap:info ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe:
RE: Struggling with my first XML::Parser project
try XML::Simple = http://search.cpan.org/~grantm/XML-Simple/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Gary Nielson Sent: Sunday, November 09, 2003 9:35 PM To: [EMAIL PROTECTED] Subject: Struggling with my first XML::Parser project I have partially written a program to parse National Weather Service alerts. But I am struggling to figure out how to parse the file so that if there an urgent alert, say, for a specific geocode or area, I can grab the headline and description for that geocode. I am working off the examples in the XML::Parser perldoc but don't understand how startElement, endElement and characterData work together. Let me show you my code and an example from the file I am trying to parse. I have included a note in the code where I am getting stuck. I am looking for advice and explanations. Code: use strict; use XML::Parser; use diagnostics; use vars qw(@array $data $xmlfile $count $tag $element $infile); my $xmlfile = us.xml; die Cannot find file \$xmlfile\ unless -f $xmlfile; $count = 0; $tag = ; $element = ; # retrieve complete file and fix errors in the file open (IN, $xmlfile) || die(Error Reading File: $xmlfile $!); { undef $/; $infile = IN; } close (IN) || die(Error Closing File: $xmlfile $!); $infile =~ s//amp;/gm; # write complete file open (PROD, $xmlfile) || die(Error Writing to File: $xmlfile $!); print PROD $infile; close (PROD) || die(Error Closing File: $xmlfile $!); my $parser = new XML::Parser; $parser-setHandlers( Start = \startElement, End = \endElement, Char = \characterData, Default = \default); $parser-parsefile($xmlfile); sub startElement { my( $parseinst, $element, %attrs ) = @_; #print start element: $element\n; } sub endElement { my( $parseinst, $element ) = @_; # i am doing nothing with this } sub characterData { my( $parseinst, $data ) = @_; print element: $element\n; print data: $data\n; push @array, $data; } NOTE: I can print the data and put it in an array, but it's useless because each line of the array is just the data for that element. I don't know how to get all of the data for each element under cap:info onto one line (ie. data for category|event|urgency|severity... etc.( so I can split it and pull out what I want. I don't even know if this is the best approach because it seems I could do a parser myself without using XML::Parser; it doesn't seem to be an easier this way. Also I see that cap:area includes its own level in the hierarchy, areaDesc and geocode, and don't know how that works, though when I print $data I get that information. sub default { my( $parseinst, $data ) = @_; } Example from file at http://www.nws.noaa.gov/alerts/us.cap. - cap:info cap:categoryMet/cap:category cap:eventNON PRECIPITATION STATEMENT/cap:event cap:urgencyUnknown/cap:urgency cap:severityUnknown/cap:severity cap:certaintyUnknown/cap:certainty cap:effective2003-11-09T09:21:00/cap:effective cap:expires2003-11-09T21:00:00/cap:expires cap:headlineNON PRECIPITATION STATEMENT/cap:headline cap:descriptionURGENT - WEATHER MESSAGE NATIONAL WEATHER SERVICE MORRISTOWN TN 400 AM EST SUN NOV 9 2003 ...FIRST HARD FREEZE OF THE FALL SEASON POSSIBLE ACROSS NORTHEAST TENNESSEE...NORTHERN PLATEAU...AND PARTS OF THE SMOKY MOUNTAINS SUNDAY NIGHT... .A RIDGE OF HIGH PRESSURE WILL CONTINUE TO BUILD INTO THE SOUTHERN APPALACHIANS TODAY ALLOWING WINDS TO BECOME LIGHT LATER TONIGHT. TEMPERATURES WILL BE ABLE TO DROP INTO THE UPPER 20S TO LOWER 30S BY DAYBREAK MONDAY ACROSS MUCH OF NORTHEAST TENNESSEE...SOUTHWEST VIRGINIA AND NORTHERN SECTIONS OF THE CUMBERLAND PLATEAU AS WELL AS THE SMOKY MOUNTAINS. STAY TUNED TO NOAA WEATHER RADIO AND OTHER LOCAL MEDIA FOR FURTHER DETAILS OR UPDATES. TNZ012017-035-041-042-044-046-072-074-092100- BLOUNT SMOKY MTN-CAMPBELL-CLAIBORNE-COCKE SMOKY MTN-HANCOCK-HAWKINS- MORGAN-NORTHWEST CARTER-NORTHWEST GREENE-SCOTT TN-SEVIER SMOKY MTN- SULLIVAN-WASHINGTON TN- INCLUDING THE CITIES OF...BRISTOL...COSBY...ELIZABETHTON... GATLINBURG...GREENEVILLE...JACKSBORO...JOHNSON CITY...ONEIDA... ROGERSVILLE...SNEEDVILLE...TAZEWELL...TOWNSEND...WARTBURG 400 AM EST SUN NOV 9 2003 ...FREEZE WATCH IS IN EFFECT FOR TONIGHT... LIGHT WINDS AND ONLY PARTLY CLOUDY SKIES WILL ALLOW TEMPERATURES TO DROP INTO THE UPPER 20S TO LOWER 30S BY DAYBREAK MONDAY. TEMPERATURES MAY DROP AT OR BELOW FREEZING FOR 1 TO 3 HOURS TONIGHT. ANY OUTSIDE PLANTS SUSCEPTIBLE TO FREEZE DAMAGE SHOULD BE BROUGHT INDOORS OR COVERED WITH MULCH OR PLASTIC. $$/cap:description cap:webhttp://www.nws.noaa.gov/alerts/us.html#TNZ017