Re: HTML Parsing Question
Hi. The big issue is the ignore unknown tag setting. Also the module does not like a missing and . Also this isn't HTML. U might be better served by using XML::Simple. Printf is ur friend. use HTML::TreeBuilder; $html = ' . '; $tree = HTML::TreeBuilder->new(); $tree->ignore_unknown(0); $tree->warn(1); $tree->parse($html); $tree->eof; @Ans = $tree->look_down('_tag' => 'button'); foreach $button (@Ans) { printf "button is %s\n", $button->attr('name'); # prints "button is Button1" printf "button tag is %s\n", $button->tag; # can also use ${$button}{_tag} prints "button tag is button" printf "button parent hash pointer is %s\n", $button->parent; # can also use ${$button}{_parent} # prints "button parent hash pointer is HTML::Element=HASH(0x1afa5c4)" foreach $key (keys %{$button->parent}) { print "key is $key\n"; # prints four keys: _parent, _content, _tag, _implicit } printf "button _parent value is %s\n", ${$button}{_parent}; # prints "button _parent value is HTML::Element=HASH(0x1afa5c4)" printf "button _tag value is %s\n", ${$button}{_tag}; # prints "button _tag value is button" $parent = $button->parent; printf "parent tag is %s\n", ${$parent}{_tag}; # prints "parent tag is body" printf "parent content is %s\n", ${$parent}{_content}; # prints "parent content is ARRAY(0x1afa7b0)" # foreach $key (@{${$parent}{_content}}) { # # print "content item is $key"; # } #print "parent name is " . $parent->attr('name'); $groupid = $button->look_up('_tag', => 'group'); # print "group id is " . $groupid; } # $tree->dump; print $tree->dump; #as_HTML; button is Button1 button tag is button button parent hash pointer is HTML::Element=HASH(0x19b9224) key is _parent key is visible key is linkanimations key is linkbaseobject key is wallpaper key is tooltiptext key is name key is linksize key is linkconnections key is isreferenceobject key is _content key is exposetovba key is _tag button _parent value is HTML::Element=HASH(0x19b9224) button _tag value is button parent tag is group parent content is ARRAY(0x19b90c8) button is Button24 button tag is button button parent hash pointer is HTML::Element=HASH(0x19b9224) key is _parent key is visible key is linkanimations key is linkbaseobject key is wallpaper key is tooltiptext key is name key is linksize key is linkconnections key is isreferenceobject key is _content key is exposetovba key is _tag button _parent value is HTML::Element=HASH(0x19b9224) button _tag value is button parent tag is group parent content is ARRAY(0x19b90c8) -- REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=-- "...ne cede malis" 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: HTML Parsing Question
From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul Rousseau Sent: 04 February 2010 16:44 To: perl Win32-users Subject: HTML Parsing Question > Hello, > >I have an HTML file that contains the following snippet: > > isReferenceObject="true" linkSize="true" linkConnections="true" > linkAnimations="linkWithoutExpression" linkBaseObject="System_Bar.Group1"> > exposeToVba="notExposed" isReferenceObject="true" linkSize="true" linkConnections="true" > linkAnimations="linkWithoutExpression" linkBaseObject="System_Bar.Button1" style="recessed" > captureCursor="false" highlightOnFocus="true" tabIndex="1"> > repeatRate="0.25"/> > foreColor="black"> > strikethrough="false" caption="Overviews"/> > > > backStyle="solid" foreColor="black"> > underline="false" strikethrough="false" caption=""/> > > > > toolTipText="" exposeToVba="notExposed" isReferenceObject="true" linkSize="true" > linkConnections="true" linkAnimations="linkWithoutExpression" linkBaseObject="System_Bar.Button24" > style="recessed" captureCursor="false" highlightOnFocus="true" tabIndex="2"> > repeatRate="0.25"/> > foreColor="black"> > strikethrough="false" caption="Oil Treating"/> > > > backStyle="solid" foreColor="black"> > underline="false" strikethrough="false" caption=""/> > > > > . > . > . > I'm not an html expert, but is 'group' a valid tag? The parser seems to be silently ignoring it. It works if you change it group to form. > I want to obtain the parent name for all "button" tags. I am having trouble extracting the parent > name. For example, I can find "Button1" and "Button24" but I can not get to either item's parent name, > "Group1". > Here is my code so far. (I believe it's something trivial; I feel I am so close.) > > foreach $filename (@filenames) > { > print "filename is $filename"; >$tree = HTML::TreeBuilder->new(); >$tree->parse_file("$sourcedir\\$filename"); >@Ans = $tree->look_down('_tag' => 'button'); >foreach $button (@Ans) > { > print "button is " . $button->attr('name'); # prints "button is Button1" > print "button tag is " . $button->tag; # can also use ${$button}{_tag} # prints "button tag is > button" > print "button parent hash pointer is " . $button->parent; # can also use ${$button}{_parent} # > prints "button parent hash pointer is HTML::Element=HASH(0x1afa5c4)" > # > #print Dumper($button->parent); > # > foreach $key (keys %{$button->parent}) > { >print "key is $key"; # prints four keys: _parent, _content, _tag, _implicit > } > print "button _parent value is " . ${$button}{_parent}; # prints "button _parent value is > HTML::Element=HASH(0x1afa5c4)" > print "button _tag value is " . ${$button}{_tag}; # prints "button _tag value is button" > > $parent = $button->parent; > print "parent tag is " . ${$parent}{_tag}; # prints "parent tag is body" > print "parent content is " . ${$parent}{_content}; # prints "parent content is ARRAY(0x1afa7b0)" > foreach $key (@{${$parent}{_content}}) >{ > #print "content item is $key"; >} > #print "parent name is " . $parent->attr('name'); > $groupid = $button->look_up('_tag', => 'group'); > # print "group id is " . $groupid; > } Alw2ays > # $tree->dump; > # print $tree->as_HTML; > } Apart from your code being poorly laid out (which may well be the result of emailing), you appear to either not be using 'use strict;' or you are declaring variables in a larger scope than necessary. Both of which are not recommended, if you will excuse the double negative. Always code with 'use strict; use warnings;' at the top of your code, and declare variables in the smallest scope necessary. HTH -- Brian Raven This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
HTML Parsing Question
Hello, I have an HTML file that contains the following snippet: . . . I want to obtain the parent name for all "button" tags. I am having trouble extracting the parent name. For example, I can find "Button1" and "Button24" but I can not get to either item's parent name, "Group1". Here is my code so far. (I believe it's something trivial; I feel I am so close.) foreach $filename (@filenames) { print "filename is $filename"; $tree = HTML::TreeBuilder->new(); $tree->parse_file("$sourcedir\\$filename"); @Ans = $tree->look_down('_tag' => 'button'); foreach $button (@Ans) { print "button is " . $button->attr('name'); # prints "button is Button1" print "button tag is " . $button->tag; # can also use ${$button}{_tag} # prints "button tag is button" print "button parent hash pointer is " . $button->parent; # can also use ${$button}{_parent} # prints "button parent hash pointer is HTML::Element=HASH(0x1afa5c4)" # #print Dumper($button->parent); # foreach $key (keys %{$button->parent}) { print "key is $key"; # prints four keys: _parent, _content, _tag, _implicit } print "button _parent value is " . ${$button}{_parent}; # prints "button _parent value is HTML::Element=HASH(0x1afa5c4)" print "button _tag value is " . ${$button}{_tag}; # prints "button _tag value is button" $parent = $button->parent; print "parent tag is " . ${$parent}{_tag}; # prints "parent tag is body" print "parent content is " . ${$parent}{_content}; # prints "parent content is ARRAY(0x1afa7b0)" foreach $key (@{${$parent}{_content}}) { #print "content item is $key"; } #print "parent name is " . $parent->attr('name'); $groupid = $button->look_up('_tag', => 'group'); # print "group id is " . $groupid; } # $tree->dump; # print $tree->as_HTML; } Any help would be appreciated. Thank you Paul Rousseau _ Check your Hotmail from your phone. http://go.microsoft.com/?linkid=9708121___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs