Thank you Brian. Your reply is indeed most eloquent. Jon Bjornstad has offered a more traditional regexp solution that I hope to comprehend, if only to sharpen my Perl skills. (I am waiting for his response to my lack thereof.)
I hope to test and share Jon's answer with the community. For now, I will leverage the HTML library APIs. Sincerely, Paul > From: bra...@nyx.com > To: perl-win32-users@listserv.activestate.com > Date: Thu, 3 Nov 2011 10:37:03 +0000 > Subject: RE: How to Extract a Date from a File > > From: perl-win32-users-boun...@listserv.activestate.com > [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul > Rousseau > Sent: 02 November 2011 16:08 > To: perl Win32-users > Subject: How to Extract a Date from a File > > > Hello Perl folks, > > > > > > I would like to know if there is an eloquent way of extracting a date > > string from a file. > > > > My code goes like this: > > > > open (INFILE, "<$sourcedir\\$filename") || die "Can not open > > $sourcedir\\$filename $!\n"; > > @filecontents = <INFILE>; > > close INFILE; > > @filecontents = map {chomp; $_} @filecontents; > > > > # > > # Within the file contents, look for the text, CurrentWeekLabel > > # > > # Here is a text sample. > > # > > # <div style="TEXT-ALIGN: center; min-width: 750px"> > > # <div style="OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: center"><span > > id="CurrentWeekLabel">Week Of: > > </span><span id="StartWeekLabel" > > style="font-weight:bold;">2011/10/29</span><span id="Label6" style="font- > > weight:bold;"> - </span><span id="EndWeekLabel" > > style="font-weight:bold;">2011/11/04</span></div> > > # <div style="OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: center"><a > > id="PreviousWeekLinkButton" > > class="LinkButton" href="javascript:OnPreviousWeekLinkButtonClick ()" > > href="javascript:__doPostBack('PreviousWeekLinkButton','')">Prev</a><span > > id="Label20"> | </span><a > > > onclick="SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); > > return false;" > > id="SelectWeekLinkButton" class="LinkButton" > > href="javascript:__doPostBack('SelectWeekLinkButton','')">Select > > > Week</a><span id="Label8"> | </span><a id="NextWeekLinkButton" > > class="LinkButton" > > href="javascript:OnNextWeekLinkButtonClick ()" > > href="javascript:__doPostBack('NextWeekLinkButton','')">Next</a></div> > > # <div style="OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: center"><span > > id="StatusLabel" > > class="StatusLabel"></span></div> > > # </div> > > # > > # Obtain the year, month and day following the text, StartWeekLabel > > # > > @ans = grep (/StartWeekLabel.+\>(\d{4})\/(\d{2})\/(\d{2})\<\/span/si, > > @filecontents); > > # > > # Build the start date from the matches. > > # > > $start_date = $1 . $2 . $3 > > > > I was wondering if there was a neat way to avoid using @ans as a temporary > > variable, and extract the > > "2011/10/29" straight into $start_date so that $start_date = "20111029" > > Using regular expressions is not usually recommended. Prefer to use the > modules that specialise in doing that. Also, there may be alternate ways to > extract the date elements, and modules to validate them. For example... > > ----------------------------------------------------------- > use strict; > use warnings; > > use HTML::TreeBuilder; > use Date::Calc qw{check_date}; > > my $root = HTML::TreeBuilder->new_from_file(*DATA); > defined $root or die "Failed to parse\n"; > my $element = $root->look_down("id", "StartWeekLabel"); > defined $element or die "Failed to locate id=StartWeekLabel\n"; > my $rawdate = $element->as_trimmed_text(); > print "Raw date '$rawdate'\n"; > my @date = split "/", $rawdate; > if ((check_date(@date))) { > print "Date looks OK: '", @date, "'\n"; > } > else { > print "That date looks invalid\n"; > } > > __DATA__ > <div style="TEXT-ALIGN: center; min-width: 750px"> > <div style="OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: center"><span > id="CurrentWeekLabel">Week Of: </span><span id="StartWeekLabel" > style="font-weight:bold;">2011/10/29</span><span id="Label6" > style="font-weight:bold;"> - </span><span id="EndWeekLabel" > style="font-weight:bold;">2011/11/04</span></div> > <div style="OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: center"><a > id="PreviousWeekLinkButton" class="LinkButton" > href="javascript:OnPreviousWeekLinkButtonClick ()" > href="javascript:__doPostBack('PreviousWeekLinkButton','')">Prev</a><span > id="Label20"> | </span><a > onclick="SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); > return false;" id="SelectWeekLinkButton" class="LinkButton" > href="javascript:__doPostBack('SelectWeekLinkButton','')">Select > Week</a><span id="Label8"> | </span><a id="NextWeekLinkButton" > class="LinkButton" href="javascript:OnNextWeekLinkButtonClick ()" > href="javascript:__doPostBack('NextWeekLinkButton','')">Next</a></div> > <div style="OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: center"><span > id="StatusLabel" class="StatusLabel"></span></div> > </div> > ----------------------------------------------------------- > > -- > Brian Raven > > > > > Please consider the environment before printing this e-mail. > > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient or have received this e-mail in error, please > advise the sender immediately by reply e-mail and delete this message and any > attachments without retaining a copy. > > Any unauthorised copying, disclosure or distribution of the material in this > e-mail is strictly forbidden. > _______________________________________________ > Perl-Win32-Users mailing list > Perl-Win32-Users@listserv.ActiveState.com > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
_______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs