Strip HTML from files in a directory

2008-10-29 Thread bdy
Does anyone know if there's a way to use an HTML stripper in Perl to
scrub the HTML from all files in a specified directory? If so, would
you point me in the correct direction.

Thanks,


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




RE: Strip HTML from files in a directory

2008-10-29 Thread Stewart Anderson
> -Original Message-
> From: bdy [mailto:[EMAIL PROTECTED]
> Sent: 29 October 2008 15:14
> To: beginners@perl.org
> Subject: Strip HTML from files in a directory
> 
> Does anyone know if there's a way to use an HTML stripper in Perl to
> scrub the HTML from all files in a specified directory? If so, would
> you point me in the correct direction.
> 
> Thanks,
> 
> 


<* pokes head above parapet *> - Stripper?  



Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trade marks of British Sky Broadcasting Group plc and 
are used under licence. British Sky Broadcasting Limited (Registration No. 
2906991), Sky Interactive Limited (Registration No. 3554332), Sky-In-Home 
Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of British Sky 
Broadcasting Group plc (Registration No. 2247735). All of the companies 
mentioned in this paragraph are incorporated in England and Wales and share the 
same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Strip HTML from files in a directory

2008-10-29 Thread Rob Dixon
bdy wrote:
>
> Does anyone know if there's a way to use an HTML stripper in Perl to
> scrub the HTML from all files in a specified directory? If so, would
> you point me in the correct direction.

I would recommend something like

  use HTML::TreeBuilder;

  my $tree = HTML::TreeBuilder->new_from_content($html);
  print $tree->as_text;

but the details depend on your application.

HTH,

Rob

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Strip HTML from files in a directory

2008-11-12 Thread bdy
On Oct 29, 10:57 pm, [EMAIL PROTECTED] (Rob Dixon) wrote:
> bdy wrote:
>
> > Does anyone know if there's a way to use an HTMLstripperin Perl to
> > scrub the HTML from all files in a specified directory? If so, would
> > you point me in the correct direction.
>
> I would recommend something like
>
>   use HTML::TreeBuilder;
>
>   my $tree = HTML::TreeBuilder->new_from_content($html);
>   print $tree->as_text;
>
> but the details depend on your application.
>
> HTH,
>
> Rob

Sorry, I should have mentioned I was an ultra-beginner. Aside from
using that in a .pl file, how else could I execute that for multiple
files in a directory?


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Strip HTML from files in a directory

2008-11-12 Thread Chas. Owens
On Wed, Nov 12, 2008 at 15:44, Chas. Owens <[EMAIL PROTECTED]> wrote:
> On Tue, Nov 11, 2008 at 17:17, bdy <[EMAIL PROTECTED]> wrote:
> snip
>> Sorry, I should have mentioned I was an ultra-beginner. Aside from
>> using that in a .pl file, how else could I execute that for multiple
>> files in a directory?
> snip
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use HTML::TreeBuilder;
>
> die "usage: $0 FILE(s)\n" unless @ARGV > 0;
>
> for my $file (@ARGV) {
>my $tree = HTML::TreeBuilder->new;
>$tree->parse_file($file_name);
>print $tree->as_text;
> }

Whoops, that would have printed out each file to stdout, this one
opens a new file per input file.

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TreeBuilder;

die "usage: $0 FILE(s)\n" unless @ARGV > 0;

for my $file (@ARGV) {
   my $tree = HTML::TreeBuilder->new;
   $tree->parse_file($file_name);
   open my $fh, ">", "$file_name.txt"
  or die "could not open $file_name.txt: $!";
   print $fh $tree->as_text;
}



-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Strip HTML from files in a directory

2008-11-12 Thread Chas. Owens
On Tue, Nov 11, 2008 at 17:17, bdy <[EMAIL PROTECTED]> wrote:
snip
> Sorry, I should have mentioned I was an ultra-beginner. Aside from
> using that in a .pl file, how else could I execute that for multiple
> files in a directory?
snip

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TreeBuilder;

die "usage: $0 FILE(s)\n" unless @ARGV > 0;

for my $file (@ARGV) {
my $tree = HTML::TreeBuilder->new;
$tree->parse_file($file_name);
print $tree->as_text;
}


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/