>>>>> "Sara" == Sara  <[EMAIL PROTECTED]> writes:

Sara> I have a couple of text files with html code in them.. e.g.
Sara> ---------- Text File --------------
Sara> <html>
Sara>     <head>
Sara>         <title>This is Test File</title>
Sara>     </head>
Sara> <body>
Sara> <font size=2 face=arial>This is the test file contents<br>
Sara> <p>
Sara> blah blah blah.........
Sara> </body>
Sara> </html>

Sara> -----------------------------------------

Sara> What I want to do is to remove/delete HTML code from the text file from a 
certain tag upto certain tag.

Sara> For example; I want to delete the code completely that comes in between <head> 
and </head> (including any style tags and embedded javascripts etc)

Sara> Any ideas?

This code will create an XML tree doc from stdin, and write the modified
version to stdout, deleting everything from the "/html/head" node
downward:

    use XML::LibXML;
    my $p = XML::LibXML->new or die;
    $p->recover(1);
    my $d = $p->parse_html_fh(\*STDIN) or die;
    $_->unbindNode for $d->findnodes("/html/head");
    print $d->toStringHTML();

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to