Re: How to Extract a Date from a File

2011-11-03 Thread Michael
 Hi,

 This is usually what I do...

 -
 #!/usr/bin/perl

 my $startDate;

 while () {
 if($_ =~ 
 /StartWeekLabel.*?([\d]{4})\/([\d]{2})\/([\d]{2}).*?\/span/i) {
 $startDate =$1$2$3;
 }
 }

 print $startDate\n;
 --

 Call the script with the text file as a parameter perl myscript.pl 
 mytextfile.txt
 If you want to search multiple files just add them as well perl 
 myscript.pl mytextfile.txt mytextfile2.txt etc

 /Michael
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: How to Extract a Date from a File

2011-11-03 Thread Brian Raven
From: perl-win32-users-boun...@listserv.activestate.com 
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul 
Rousseau
Sent: 02 November 2011 16:08
To: perl Win32-users
Subject: How to Extract a Date from a File

 Hello Perl folks,


 I would like to know if there is an eloquent way of extracting a date string 
 from a file.

 My code goes like this:

   open (INFILE, $sourcedir\\$filename) || die Can not open 
 $sourcedir\\$filename $!\n;
   @filecontents = INFILE;
   close INFILE;
   @filecontents = map {chomp; $_} @filecontents;

 #
 # Within the file contents, look for the text, CurrentWeekLabel
 #
 # Here is a text sample.
 #
 #   div style=TEXT-ALIGN: center; min-width: 750px
 #div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan 
 id=CurrentWeekLabelWeek Of:
 /spanspan id=StartWeekLabel 
 style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-
 weight:bold; - /spanspan id=EndWeekLabel 
 style=font-weight:bold;2011/11/04/span/div
 #div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera 
 id=PreviousWeekLinkButton
 class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick ()
 href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan 
 id=Label20nbsp;|nbsp;/spana  
 onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); 
 return false;
 id=SelectWeekLinkButton class=LinkButton 
 href=javascript:__doPostBack('SelectWeekLinkButton','')Select  
 Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton 
 class=LinkButton
 href=javascript:OnNextWeekLinkButtonClick ()
 href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
 #div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: 
 centerspan id=StatusLabel
 class=StatusLabel/span/div
 #   /div
 #
 # Obtain the year, month and day following the text, StartWeekLabel
 #
  @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si, 
 @filecontents);
 #
 # Build the start date from the matches.
 #
 $start_date = $1 . $2 . $3

 I was wondering if there was a neat way to avoid using @ans as a temporary 
 variable, and extract the
 2011/10/29 straight into $start_date so that $start_date = 20111029

Using regular expressions is not usually recommended. Prefer to use the modules 
that specialise in doing that. Also, there may be alternate ways to extract the 
date elements, and modules to validate them. For example...

---
use strict;
use warnings;

use HTML::TreeBuilder;
use Date::Calc qw{check_date};

my $root = HTML::TreeBuilder-new_from_file(*DATA);
defined $root or die Failed to parse\n;
my $element = $root-look_down(id, StartWeekLabel);
defined $element or die Failed to locate id=StartWeekLabel\n;
my $rawdate = $element-as_trimmed_text();
print Raw date '$rawdate'\n;
my @date = split /, $rawdate;
if ((check_date(@date))) {
print Date looks OK: ', @date, '\n;
}
else {
print That date looks invalid\n;
}

__DATA__
div style=TEXT-ALIGN: center; min-width: 750px
div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan 
id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel 
style=font-weight:bold;2011/10/29/spanspan id=Label6 
style=font-weight:bold; - /spanspan id=EndWeekLabel 
style=font-weight:bold;2011/11/04/span/div
div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera 
id=PreviousWeekLinkButton class=LinkButton 
href=javascript:OnPreviousWeekLinkButtonClick () 
href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan 
id=Label20nbsp;|nbsp;/spana 
onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); 
return false; id=SelectWeekLinkButton class=LinkButton 
href=javascript:__doPostBack('SelectWeekLinkButton','')Select Week/aspan 
id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton class=LinkButton 
href=javascript:OnNextWeekLinkButtonClick () 
href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan 
id=StatusLabel class=StatusLabel/span/div
/div
---

--
Brian Raven




Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: How to Extract a Date from a File

2011-11-03 Thread Brian Raven

 -Original Message-
 From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-
 win32-users-boun...@listserv.activestate.com] On Behalf Of Brian Raven
 Sent: 03 November 2011 10:37
 To: perl Win32-users
 Subject: RE: How to Extract a Date from a File

 From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-
 win32-users-boun...@listserv.activestate.com] On Behalf Of Paul
 Rousseau
 Sent: 02 November 2011 16:08
 To: perl Win32-users
 Subject: How to Extract a Date from a File

 ...

 Using regular expressions is not usually recommended.

Sorry that should read  Using regular expressions is not usually recommended 
for parsing HTML.


--
Brian Raven




Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: How to Extract a Date from a File

2011-11-03 Thread Paul Rousseau

Thank you Brian.
 
Your reply is indeed most eloquent.
 
Jon Bjornstad has offered a more traditional regexp solution that I hope to 
comprehend, if only to sharpen my Perl skills. (I am waiting for his response 
to my lack thereof.)

I hope to test and share Jon's answer with the community. For now, I will 
leverage the HTML library APIs.
 
Sincerely,
 
Paul
 


 From: bra...@nyx.com
 To: perl-win32-users@listserv.activestate.com
 Date: Thu, 3 Nov 2011 10:37:03 +
 Subject: RE: How to Extract a Date from a File
 
 From: perl-win32-users-boun...@listserv.activestate.com 
 [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul 
 Rousseau
 Sent: 02 November 2011 16:08
 To: perl Win32-users
 Subject: How to Extract a Date from a File
 
  Hello Perl folks,
 
 
  I would like to know if there is an eloquent way of extracting a date 
  string from a file.
 
  My code goes like this:
 
  open (INFILE, $sourcedir\\$filename) || die Can not open 
  $sourcedir\\$filename $!\n;
  @filecontents = INFILE;
  close INFILE;
  @filecontents = map {chomp; $_} @filecontents;
 
  #
  # Within the file contents, look for the text, CurrentWeekLabel
  #
  # Here is a text sample.
  #
  # div style=TEXT-ALIGN: center; min-width: 750px
  # div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan 
  id=CurrentWeekLabelWeek Of:
  /spanspan id=StartWeekLabel 
  style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-
  weight:bold; - /spanspan id=EndWeekLabel 
  style=font-weight:bold;2011/11/04/span/div
  # div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera 
  id=PreviousWeekLinkButton
  class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick ()
  href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan 
  id=Label20nbsp;|nbsp;/spana  
  onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); 
  return false;
  id=SelectWeekLinkButton class=LinkButton 
  href=javascript:__doPostBack('SelectWeekLinkButton','')Select  
  Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton 
  class=LinkButton
  href=javascript:OnNextWeekLinkButtonClick ()
  href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
  # div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan 
  id=StatusLabel
  class=StatusLabel/span/div
  # /div
  #
  # Obtain the year, month and day following the text, StartWeekLabel
  #
  @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si, 
  @filecontents);
  #
  # Build the start date from the matches.
  #
  $start_date = $1 . $2 . $3
 
  I was wondering if there was a neat way to avoid using @ans as a temporary 
  variable, and extract the
  2011/10/29 straight into $start_date so that $start_date = 20111029
 
 Using regular expressions is not usually recommended. Prefer to use the 
 modules that specialise in doing that. Also, there may be alternate ways to 
 extract the date elements, and modules to validate them. For example...
 
 ---
 use strict;
 use warnings;
 
 use HTML::TreeBuilder;
 use Date::Calc qw{check_date};
 
 my $root = HTML::TreeBuilder-new_from_file(*DATA);
 defined $root or die Failed to parse\n;
 my $element = $root-look_down(id, StartWeekLabel);
 defined $element or die Failed to locate id=StartWeekLabel\n;
 my $rawdate = $element-as_trimmed_text();
 print Raw date '$rawdate'\n;
 my @date = split /, $rawdate;
 if ((check_date(@date))) {
 print Date looks OK: ', @date, '\n;
 }
 else {
 print That date looks invalid\n;
 }
 
 __DATA__
 div style=TEXT-ALIGN: center; min-width: 750px
 div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan 
 id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel 
 style=font-weight:bold;2011/10/29/spanspan id=Label6 
 style=font-weight:bold; - /spanspan id=EndWeekLabel 
 style=font-weight:bold;2011/11/04/span/div
 div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera 
 id=PreviousWeekLinkButton class=LinkButton 
 href=javascript:OnPreviousWeekLinkButtonClick () 
 href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan 
 id=Label20nbsp;|nbsp;/spana 
 onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton'); 
 return false; id=SelectWeekLinkButton class=LinkButton 
 href=javascript:__doPostBack('SelectWeekLinkButton','')Select 
 Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton 
 class=LinkButton href=javascript:OnNextWeekLinkButtonClick () 
 href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
 div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN: centerspan 
 id=StatusLabel class=StatusLabel/span/div
 /div
 ---
 
 --
 Brian Raven
 
 
 
 
 Please consider the environment before printing this e-mail.
 
 This e-mail may contain confidential and/or privileged information. If you 
 are not the intended recipient or have received this e-mail in error, please 
 advise the sender 

Re: How to Extract a Date from a File

2011-11-03 Thread Phil Rafferty Sr.
@filecontents = map {chomp; $_} @filecontents;   # -- don't need the map
construct.

chomp @filecontents;  # does what you want


On Thu, Nov 3, 2011 at 11:44 AM, Paul Rousseau
paulrousseau...@hotmail.comwrote:

  Thank you Brian.

 Your reply is indeed most eloquent.

 Jon Bjornstad has offered a more traditional regexp solution that I hope
 to comprehend, if only to sharpen my Perl skills. (I am waiting for his
 response to my lack thereof.)
  I hope to test and share Jon's answer with the community. For now, I
 will leverage the HTML library APIs.

 Sincerely,

 Paul

   From: bra...@nyx.com
  To: perl-win32-users@listserv.activestate.com
  Date: Thu, 3 Nov 2011 10:37:03 +
  Subject: RE: How to Extract a Date from a File
 
  From: perl-win32-users-boun...@listserv.activestate.com [mailto:
 perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul
 Rousseau
  Sent: 02 November 2011 16:08
  To: perl Win32-users
  Subject: How to Extract a Date from a File
 
   Hello Perl folks,
  
  
   I would like to know if there is an eloquent way of extracting a date
 string from a file.
  
   My code goes like this:
  
   open (INFILE, $sourcedir\\$filename) || die Can not open
 $sourcedir\\$filename $!\n;
   @filecontents = INFILE;
   close INFILE;
   @filecontents = map {chomp; $_} @filecontents;
  
   #
   # Within the file contents, look for the text, CurrentWeekLabel
   #
   # Here is a text sample.
   #
   # div style=TEXT-ALIGN: center; min-width: 750px
   # div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN:
 centerspan id=CurrentWeekLabelWeek Of:
   /spanspan id=StartWeekLabel
 style=font-weight:bold;2011/10/29/spanspan id=Label6 style=font-
   weight:bold; - /spanspan id=EndWeekLabel
 style=font-weight:bold;2011/11/04/span/div
   # div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera
 id=PreviousWeekLinkButton
   class=LinkButton href=javascript:OnPreviousWeekLinkButtonClick ()
  
 href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan
 id=Label20nbsp;|nbsp;/spana 
 onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton');
 return false;
   id=SelectWeekLinkButton class=LinkButton
 href=javascript:__doPostBack('SelectWeekLinkButton','')Select 
 Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton
 class=LinkButton
   href=javascript:OnNextWeekLinkButtonClick ()
   href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
   # div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN:
 centerspan id=StatusLabel
   class=StatusLabel/span/div
   # /div
   #
   # Obtain the year, month and day following the text, StartWeekLabel
   #
   @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si,
 @filecontents);
   #
   # Build the start date from the matches.
   #
   $start_date = $1 . $2 . $3
  
   I was wondering if there was a neat way to avoid using @ans as a
 temporary variable, and extract the
   2011/10/29 straight into $start_date so that $start_date = 20111029
 
  Using regular expressions is not usually recommended. Prefer to use the
 modules that specialise in doing that. Also, there may be alternate ways to
 extract the date elements, and modules to validate them. For example...
 
  ---
  use strict;
  use warnings;
 
  use HTML::TreeBuilder;
  use Date::Calc qw{check_date};
 
  my $root = HTML::TreeBuilder-new_from_file(*DATA);
  defined $root or die Failed to parse\n;
  my $element = $root-look_down(id, StartWeekLabel);
  defined $element or die Failed to locate id=StartWeekLabel\n;
  my $rawdate = $element-as_trimmed_text();
  print Raw date '$rawdate'\n;
  my @date = split /, $rawdate;
  if ((check_date(@date))) {
  print Date looks OK: ', @date, '\n;
  }
  else {
  print That date looks invalid\n;
  }
 
  __DATA__
  div style=TEXT-ALIGN: center; min-width: 750px
  div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan
 id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel
 style=font-weight:bold;2011/10/29/spanspan id=Label6
 style=font-weight:bold; - /spanspan id=EndWeekLabel
 style=font-weight:bold;2011/11/04/span/div
  div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera
 id=PreviousWeekLinkButton class=LinkButton
 href=javascript:OnPreviousWeekLinkButtonClick ()
 href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan
 id=Label20nbsp;|nbsp;/spana
 onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton');
 return false; id=SelectWeekLinkButton class=LinkButton
 href=javascript:__doPostBack('SelectWeekLinkButton','')Select
 Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton
 class=LinkButton href=javascript:OnNextWeekLinkButtonClick ()
 href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
  div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN:
 centerspan id=StatusLabel class=StatusLabel/span/div
  /div
  ---
 
  --
  Brian Raven
 
 
 
 

Re: How to Extract a Date from a File

2011-11-02 Thread will trillich
How about something like this:

  next unless m:(\d\d\d\d)/(\d\d)/(\d\d):;
  $start_date = $1$2$3;


On Wed, Nov 2, 2011 at 4:07 PM, Paul Rousseau
paulrousseau...@hotmail.comwrote:

  Hello Perl folks,


 I would like to know if there is an eloquent way of extracting a date
 string from a file.

 My code goes like this:

open (INFILE, $sourcedir\\$filename) || die Can not open
 $sourcedir\\$filename $!\n;
@filecontents = INFILE;
close INFILE;
@filecontents = map {chomp; $_} @filecontents;

 #
 # Within the file contents, look for the text, CurrentWeekLabel
 #
 # Here is a text sample.
 #
 #   div style=TEXT-ALIGN: center; min-width: 750px
 #div style=OVERFLOW: hidden; HEIGHT: 20px; TEXT-ALIGN: centerspan
 id=CurrentWeekLabelWeek Of: /spanspan id=StartWeekLabel
 style=font-weight:bold;2011/10/29/spanspan id=Label6
 style=font-weight:bold; - /spanspan id=EndWeekLabel
 style=font-weight:bold;2011/11/04/span/div
 #div style=OVERFLOW: hidden; HEIGHT: 24px; TEXT-ALIGN: centera
 id=PreviousWeekLinkButton class=LinkButton
 href=javascript:OnPreviousWeekLinkButtonClick ()
 href=javascript:__doPostBack('PreviousWeekLinkButton','')Prev/aspan
 id=Label20nbsp;|nbsp;/spana
 onclick=SelectWeekButtonClick('PopupCalendar1', 'SelectWeekLinkButton');
 return false; id=SelectWeekLinkButton class=LinkButton
 href=javascript:__doPostBack('SelectWeekLinkButton','')Select
 Week/aspan id=Label8nbsp;|nbsp;/spana id=NextWeekLinkButton
 class=LinkButton href=javascript:OnNextWeekLinkButtonClick ()
 href=javascript:__doPostBack('NextWeekLinkButton','')Next/a/div
 #div style=OVERFLOW: hidden; OVERFLOW:visible; TEXT-ALIGN:
 centerspan id=StatusLabel class=StatusLabel/span/div
 #   /div
 #
 # Obtain the year, month and day following the text, StartWeekLabel
 #
   @ans = grep (/StartWeekLabel.+\(\d{4})\/(\d{2})\/(\d{2})\\/span/si,
 @filecontents);
 #
 # Build the start date from the matches.
 #
 $start_date = $1 . $2 . $3

 I was wondering if there was a neat way to avoid using @ans as a temporary
 variable, and extract the 2011/10/29 straight into $start_date so that
 $start_date = 20111029

 Thank you

 Paul Rousseau
 403 776 4293


 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs




-- 
The very nucleus of Character: to do what you know you should do, when you
don't want to do it. Stephen Covey
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs