Thank you Brian.
 
Your reply is indeed most eloquent.
 
Jon Bjornstad has offered a more traditional regexp solution that I hope to 
comprehend, if only to sharpen my Perl skills. (I am waiting for his response 
to my lack thereof.)

I hope to test and share Jon's answer with the community. For now, I will 
leverage the HTML library APIs.
 
Sincerely,
 
Paul
 


> From: bra...@nyx.com
> To: perl-win32-users@listserv.activestate.com
> Date: Thu, 3 Nov 2011 10:37:03 +0000
> Subject: RE: How to Extract a Date from a File
> 
> From: perl-win32-users-boun...@listserv.activestate.com 
> [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul 
> Rousseau
> Sent: 02 November 2011 16:08
> To: perl Win32-users
> Subject: How to Extract a Date from a File
> 
> > Hello Perl folks,
> >
> >
> > I would like to know if there is an eloquent way of extracting a date 
> > string from a file.
> >
> > My code goes like this:
> >
> > open (INFILE, "<$sourcedir\\$filename") || die "Can not open 
> > $sourcedir\\$filename $!\n";
> > @filecontents = <INFILE>;
> > close INFILE;
> > @filecontents = map {chomp; $_} @filecontents;
> >
> > #
> > # Within the file contents, look for the text, CurrentWeekLabel
> > #
> > # Here is a text sample.
> > #
> > # <div style="TEXT-ALIGN: center; min-width: 750px">
> > # <div style="OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: center"><span 
> > id="CurrentWeekLabel">Week Of:
> > </span><span id="StartWeekLabel" 
> > style="font-weight:bold;">2011/10/29</span><span id="Label6" style="font-
> > weight:bold;"> - </span><span id="EndWeekLabel" 
> > style="font-weight:bold;">2011/11/04</span></div>
> > # <div style="OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: center"><a 
> > id="PreviousWeekLinkButton"
> > class="LinkButton" href="javascript:OnPreviousWeekLinkButtonClick ()"
> > href="javascript:__doPostBack('PreviousWeekLinkButton','')">Prev</a><span 
> > id="Label20">&nbsp;|&nbsp;</span><a > 
> > onclick="SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); 
> > return false;"
> > id="SelectWeekLinkButton" class="LinkButton" 
> > href="javascript:__doPostBack('SelectWeekLinkButton','')">Select > 
> > Week</a><span id="Label8">&nbsp;|&nbsp;</span><a id="NextWeekLinkButton" 
> > class="LinkButton"
> > href="javascript:OnNextWeekLinkButtonClick ()"
> > href="javascript:__doPostBack('NextWeekLinkButton','')">Next</a></div>
> > # <div style="OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: center"><span 
> > id="StatusLabel"
> > class="StatusLabel"></span></div>
> > # </div>
> > #
> > # Obtain the year, month and day following the text, StartWeekLabel
> > #
> > @ans = grep (/StartWeekLabel.+\>(\d{4})\/(\d{2})\/(\d{2})\<\/span/si, 
> > @filecontents);
> > #
> > # Build the start date from the matches.
> > #
> > $start_date = $1 . $2 . $3
> >
> > I was wondering if there was a neat way to avoid using @ans as a temporary 
> > variable, and extract the
> > "2011/10/29" straight into $start_date so that $start_date = "20111029"
> 
> Using regular expressions is not usually recommended. Prefer to use the 
> modules that specialise in doing that. Also, there may be alternate ways to 
> extract the date elements, and modules to validate them. For example...
> 
> -----------------------------------------------------------
> use strict;
> use warnings;
> 
> use HTML::TreeBuilder;
> use Date::Calc qw{check_date};
> 
> my $root = HTML::TreeBuilder->new_from_file(*DATA);
> defined $root or die "Failed to parse\n";
> my $element = $root->look_down("id", "StartWeekLabel");
> defined $element or die "Failed to locate id=StartWeekLabel\n";
> my $rawdate = $element->as_trimmed_text();
> print "Raw date '$rawdate'\n";
> my @date = split "/", $rawdate;
> if ((check_date(@date))) {
> print "Date looks OK: '", @date, "'\n";
> }
> else {
> print "That date looks invalid\n";
> }
> 
> __DATA__
> <div style="TEXT-ALIGN: center; min-width: 750px">
> <div style="OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: center"><span 
> id="CurrentWeekLabel">Week Of: </span><span id="StartWeekLabel" 
> style="font-weight:bold;">2011/10/29</span><span id="Label6" 
> style="font-weight:bold;"> - </span><span id="EndWeekLabel" 
> style="font-weight:bold;">2011/11/04</span></div>
> <div style="OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: center"><a 
> id="PreviousWeekLinkButton" class="LinkButton" 
> href="javascript:OnPreviousWeekLinkButtonClick ()" 
> href="javascript:__doPostBack('PreviousWeekLinkButton','')">Prev</a><span 
> id="Label20">&nbsp;|&nbsp;</span><a 
> onclick="SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); 
> return false;" id="SelectWeekLinkButton" class="LinkButton" 
> href="javascript:__doPostBack('SelectWeekLinkButton','')">Select 
> Week</a><span id="Label8">&nbsp;|&nbsp;</span><a id="NextWeekLinkButton" 
> class="LinkButton" href="javascript:OnNextWeekLinkButtonClick ()" 
> href="javascript:__doPostBack('NextWeekLinkButton','')">Next</a></div>
> <div style="OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: center"><span 
> id="StatusLabel" class="StatusLabel"></span></div>
> </div>
> -----------------------------------------------------------
> 
> --
> Brian Raven
> 
> 
> 
> 
> Please consider the environment before printing this e-mail.
> 
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient or have received this e-mail in error, please 
> advise the sender immediately by reply e-mail and delete this message and any 
> attachments without retaining a copy.
> 
> Any unauthorised copying, disclosure or distribution of the material in this 
> e-mail is strictly forbidden.
> _______________________________________________
> Perl-Win32-Users mailing list
> Perl-Win32-Users@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
                                          
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to