Hi,
Don't use regular expressions for matching.
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_content($html_content);
my $div = $tree->look_down( _tag => 'div', id => 'product', class => 'product'
);
my $table = $div->look_down( _tag => 'table', class => 'prodc' );
#Here you can get the table components like:
my @tr = $table->look_down( _tag => 'tr' );
for my $tr ( @tr ) {
my @td = $tr->look_down( _tag => 'td' );
print $td[0]->as_text;
}
Or you can do many more or do much more complex searching for HTML elements
using HTML::TreeBuilder.
Read:
perldoc HTML::TreeBuilder
perldoc HTML::Element
--Octavian
----- Original Message -----
From: [email protected]
To: [email protected]
Sent: Tuesday, November 18, 2014 10:22 PM
Subject: Match HTML <div> ...... </dv> string over multiple
I am trying to extract a table (<table class="xxxx"><tr><td>...... until
</table>) and its content from an HTML file.
With the file I have something like this
<div id="product" class="product">
<table border="0" cellspacing="0" cellpadding="0" class="prodc"
title="Product ">
.
.
.
</table>
</div>
There could be more that one table in the file.however I am only interested
in the table within <div id="product" class="product"> </div>.
/^.*<div id="product" class="product">.+?(<table
border="0".+?\s+<\/table>)\s*<\/div>.*$/ims
The above and various variations I tried do not much.
I am able to easily match this using sed, however I need to try using perl.
This sed work just fine:
sed -n '/<div id="product" class="product">/,/<\/table>/p' thelo826.html |sed
-n '/<table border.*/,/<\/table>/p'| sed -e 's/class=".*"//g'
Thanks
Mimi