Re: How to Extract a Date from a File
Hi, This is usually what I do... - #!/usr/bin/perl my $startDate; while () { if($_ =~ /StartWeekLabel.*?([\d]{4})\/([\d]{2})\/([\d]{2}).*?\/span/i) { $startDate =$1$2$3; } } print $startDate\n; -- Call the script with the text file as a parameter perl myscript.pl mytextfile.txt If you want to search multiple files just add them as well perl myscript.pl mytextfile.txt mytextfile2.txt etc /Michael ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: How to Extract a Date from a File
From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul Rousseau Sent: 02 November 2011 16:08 To: perl Win32-users Subject: How to Extract a Date from a File Hello Perl folks, I would like to know if there is an eloquent way of extracting a date string from a file. My code goes like this: open (INFILE, $sourcedir\\$filename) || die Can not open $sourcedir\\$filename $!\n; @filecontents = INFILE; close INFILE; @filecontents = map {chomp; $_} @filecontents; # # Within the file contents, look for the text, CurrentWeekLabel # # Here is a text sample. # # div style=TEXT-ALIGN: center; min-width: 750px #div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font- weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div #div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div #div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div # /div # # Obtain the year, month and day following the text, StartWeekLabel # @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si, @filecontents); # # Build the start date from the matches. # $start_date = $1 . $2 . $3 I was wondering if there was a neat way to avoid using @ans as a temporary variable, and extract the 2011/10/29 straight into $start_date so that $start_date = 20111029 Using regular expressions is not usually recommended. Prefer to use the modules that specialise in doing that. Also, there may be alternate ways to extract the date elements, and modules to validate them. For example... --- use strict; use warnings; use HTML::TreeBuilder; use Date::Calc qw{check_date}; my $root = HTML::TreeBuilder-new_from_file(*DATA); defined $root or die Failed to parse\n; my $element = $root-look_down(id, StartWeekLabel); defined $element or die Failed to locate id=StartWeekLabel\n; my $rawdate = $element-as_trimmed_text(); print Raw date '$rawdate'\n; my @date = split /, $rawdate; if ((check_date(@date))) { print Date looks OK: ', @date, '\n; } else { print That date looks invalid\n; } __DATA__ div style=TEXT-ALIGN: center; min-width: 750px div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div /div --- -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: How to Extract a Date from a File
-Original Message- From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl- win32-users-boun...@listserv.activestate.com] On Behalf Of Brian Raven Sent: 03 November 2011 10:37 To: perl Win32-users Subject: RE: How to Extract a Date from a File From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl- win32-users-boun...@listserv.activestate.com] On Behalf Of Paul Rousseau Sent: 02 November 2011 16:08 To: perl Win32-users Subject: How to Extract a Date from a File ... Using regular expressions is not usually recommended. Sorry that should read Using regular expressions is not usually recommended for parsing HTML. -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: How to Extract a Date from a File
Thank you Brian. Your reply is indeed most eloquent. Jon Bjornstad has offered a more traditional regexp solution that I hope to comprehend, if only to sharpen my Perl skills. (I am waiting for his response to my lack thereof.) I hope to test and share Jon's answer with the community. For now, I will leverage the HTML library APIs. Sincerely, Paul From: bra...@nyx.com To: perl-win32-users@listserv.activestate.com Date: Thu, 3 Nov 2011 10:37:03 + Subject: RE: How to Extract a Date from a File From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul Rousseau Sent: 02 November 2011 16:08 To: perl Win32-users Subject: How to Extract a Date from a File Hello Perl folks, I would like to know if there is an eloquent way of extracting a date string from a file. My code goes like this: open (INFILE, $sourcedir\\$filename) || die Can not open $sourcedir\\$filename $!\n; @filecontents = INFILE; close INFILE; @filecontents = map {chomp; $_} @filecontents; # # Within the file contents, look for the text, CurrentWeekLabel # # Here is a text sample. # # div style=TEXT-ALIGN: center; min-width: 750px # div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font- weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div # div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div # div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div # /div # # Obtain the year, month and day following the text, StartWeekLabel # @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si, @filecontents); # # Build the start date from the matches. # $start_date = $1 . $2 . $3 I was wondering if there was a neat way to avoid using @ans as a temporary variable, and extract the 2011/10/29 straight into $start_date so that $start_date = 20111029 Using regular expressions is not usually recommended. Prefer to use the modules that specialise in doing that. Also, there may be alternate ways to extract the date elements, and modules to validate them. For example... --- use strict; use warnings; use HTML::TreeBuilder; use Date::Calc qw{check_date}; my $root = HTML::TreeBuilder-new_from_file(*DATA); defined $root or die Failed to parse\n; my $element = $root-look_down(id, StartWeekLabel); defined $element or die Failed to locate id=StartWeekLabel\n; my $rawdate = $element-as_trimmed_text(); print Raw date '$rawdate'\n; my @date = split /, $rawdate; if ((check_date(@date))) { print Date looks OK: ', @date, '\n; } else { print That date looks invalid\n; } __DATA__ div style=TEXT-ALIGN: center; min-width: 750px div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div /div --- -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender
Re: How to Extract a Date from a File
@filecontents = map {chomp; $_} @filecontents; # -- don't need the map construct. chomp @filecontents; # does what you want On Thu, Nov 3, 2011 at 11:44 AM, Paul Rousseau paulrousseau...@hotmail.comwrote: Thank you Brian. Your reply is indeed most eloquent. Jon Bjornstad has offered a more traditional regexp solution that I hope to comprehend, if only to sharpen my Perl skills. (I am waiting for his response to my lack thereof.) I hope to test and share Jon's answer with the community. For now, I will leverage the HTML library APIs. Sincerely, Paul From: bra...@nyx.com To: perl-win32-users@listserv.activestate.com Date: Thu, 3 Nov 2011 10:37:03 + Subject: RE: How to Extract a Date from a File From: perl-win32-users-boun...@listserv.activestate.com [mailto: perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul Rousseau Sent: 02 November 2011 16:08 To: perl Win32-users Subject: How to Extract a Date from a File Hello Perl folks, I would like to know if there is an eloquent way of extracting a date string from a file. My code goes like this: open (INFILE, $sourcedir\\$filename) || die Can not open $sourcedir\\$filename $!\n; @filecontents = INFILE; close INFILE; @filecontents = map {chomp; $_} @filecontents; # # Within the file contents, look for the text, CurrentWeekLabel # # Here is a text sample. # # div style=TEXT-ALIGN: center; min-width: 750px # div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font- weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div # div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div # div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div # /div # # Obtain the year, month and day following the text, StartWeekLabel # @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si, @filecontents); # # Build the start date from the matches. # $start_date = $1 . $2 . $3 I was wondering if there was a neat way to avoid using @ans as a temporary variable, and extract the 2011/10/29 straight into $start_date so that $start_date = 20111029 Using regular expressions is not usually recommended. Prefer to use the modules that specialise in doing that. Also, there may be alternate ways to extract the date elements, and modules to validate them. For example... --- use strict; use warnings; use HTML::TreeBuilder; use Date::Calc qw{check_date}; my $root = HTML::TreeBuilder-new_from_file(*DATA); defined $root or die Failed to parse\n; my $element = $root-look_down(id, StartWeekLabel); defined $element or die Failed to locate id=StartWeekLabel\n; my $rawdate = $element-as_trimmed_text(); print Raw date '$rawdate'\n; my @date = split /, $rawdate; if ((check_date(@date))) { print Date looks OK: ', @date, '\n; } else { print That date looks invalid\n; } __DATA__ div style=TEXT-ALIGN: center; min-width: 750px div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div /div --- -- Brian Raven
Re: How to Extract a Date from a File
How about something like this: next unless m:(\d\d\d\d)/(\d\d)/(\d\d):; $start_date = $1$2$3; On Wed, Nov 2, 2011 at 4:07 PM, Paul Rousseau paulrousseau...@hotmail.comwrote: Hello Perl folks, I would like to know if there is an eloquent way of extracting a date string from a file. My code goes like this: open (INFILE, $sourcedir\\$filename) || die Can not open $sourcedir\\$filename $!\n; @filecontents = INFILE; close INFILE; @filecontents = map {chomp; $_} @filecontents; # # Within the file contents, look for the text, CurrentWeekLabel # # Here is a text sample. # # div style=TEXT-ALIGN: center; min-width: 750px #div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-weight:bold; - /spanspan id=EndWeekLabel style=font-weight:bold;2011/11/04/span/div #div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera id=PreviousWeekLinkButton class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick () href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan id=Label20nbsp;|nbsp;/spana onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); return false; id=SelectWeekLinkButton class=LinkButton href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton href=javascript:OnNextWeekLinkButtonClick () href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div #div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan id=StatusLabel class=StatusLabel/span/div # /div # # Obtain the year, month and day following the text, StartWeekLabel # @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si, @filecontents); # # Build the start date from the matches. # $start_date = $1 . $2 . $3 I was wondering if there was a neat way to avoid using @ans as a temporary variable, and extract the 2011/10/29 straight into $start_date so that $start_date = 20111029 Thank you Paul Rousseau 403 776 4293 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs -- The very nucleus of Character: to do what you know you should do, when you don't want to do it. Stephen Covey ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs