The printable text for a particular HTML::Element is part of the content
list. The content list can have either simple text items or other
HTML::Element objects, or a mix. I think you are talking about
something like the following?
foreach (@{$parent->content}) {
unless ( ref($_) && $_->isa('HTML::Element') ) {
printf "%s \n", $_;
}
}
-Dgg
-----Original Message-----
From: bruce [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 03, 2004 9:27 AM
To: Darrell Gammill; 'Gisle Aas'
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: more specific html parsing question..
in looking at the issue in more depth.. the real question i have is how
can one take a "list" of items were the parent has descendants, and
simply "pop" off the "text" for the parent/root node without the
children???
as i stated, this is basically what i'm trying to do for now, and what
has me stymied...
it appears that this should be realtively straightforward, but i must be
missing something!
thanks
-bruce
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, June 03, 2004 6:25 AM
To: 'Darrell Gammill'; 'Gisle Aas'
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: :mechanize issues/mechanize.pm dies!!
aha!!!
so my 1st real attempt at using new mods in perl.. and i stumble on a
bug!!!.. cool... i'll make the patch.. and try to use www::mechanize...
i do have an additional problem/issue/question. i have a small situation
where i'm trying to get the text for the "parent"/top item...
i'm trying to use html::treebuilder as the parsing app...
the html is:
.
.
.
<tr class="tbon"> @0.1.1.0.0.0.2.1.0.0.0.3.1
<td colspan=7> @0.1.1.0.0.0.2.1.0.0.0.3.1.0
<p class="tbtx"> @0.1.1.0.0.0.2.1.0.0.0.3.1.0.0
<span class="em"> @0.1.1.0.0.0.2.1.0.0.0.3.1.0.0.0
"ACCA 310F "
<span class="on"> @0.1.1.0.0.0.2.1.0.0.0.3.1.0.0.0.1
"FOUNDATIONS OF ACCOUNTING"
" A A "
<b> @0.1.1.0.0.0.2.1.0.0.0.3.1.0.0.2
.
.
.
i'm trying to figure out how to access/print "ACC 310F" as a separate
element...
using:
@span_tree = $tbtx_tree[0]->look_down("_tag"=>"span");
print "tree span = ". $span_tree[0]->parent()->dump() ."\n";
generates:
<span class="em"> @0.1.1.0.0.0.2.1.0.0.0.3.1.0.0.0
"ACCA 310F "
<span class="on"> @0.1.1.0.0.0.2.1.0.0.0.3.1.0.0.0.1
"FOUNDATIONS OF ACCOUNTING"
i appear to be having an issue with my approach given that both tags are
"spans".. any way to separate them...
i can get the "FOUNDATIONS..." by simply looking at $span_tree[1]....,
but
$span_tree[0] appears to contain both spans.. any way to separate
them...
any ideas/comments/criticisms/etc.. would be appreciated...
thanks
-bruce