Re: HTML Parsing Question

2010-02-05 Thread Chris Wagner
Hi.  The big issue is the ignore unknown tag setting.  Also the module does
not like a missing  and .

Also this isn't HTML.  U might be better served by using XML::Simple.
Printf is ur friend.

use HTML::TreeBuilder;

$html = '

.


';

$tree = HTML::TreeBuilder->new();
$tree->ignore_unknown(0);
$tree->warn(1);
$tree->parse($html); $tree->eof;

@Ans = $tree->look_down('_tag' => 'button');
foreach $button (@Ans) {
printf "button is %s\n", $button->attr('name'); # prints "button is 
Button1"
printf "button tag is %s\n", $button->tag;  
# can also use ${$button}{_tag}  prints "button tag is button"
printf "button parent hash pointer is %s\n", $button->parent;   
# can also use ${$button}{_parent} 
# prints "button parent hash pointer is HTML::Element=HASH(0x1afa5c4)"

foreach $key (keys %{$button->parent}) {
print "key is $key\n";
# prints four keys: _parent, _content, _tag, _implicit
}
printf "button _parent value is %s\n", ${$button}{_parent};  
# prints "button _parent value is HTML::Element=HASH(0x1afa5c4)"
printf "button _tag value is %s\n", ${$button}{_tag};  
# prints "button _tag value is button"

$parent = $button->parent;
printf "parent tag is %s\n", ${$parent}{_tag};
# prints "parent tag is body"
printf "parent content is %s\n", ${$parent}{_content};
# prints "parent content is ARRAY(0x1afa7b0)"
#   foreach $key (@{${$parent}{_content}}) {
#   #   print "content item is $key";
#   }

#print "parent name is " . $parent->attr('name');

$groupid = $button->look_up('_tag', => 'group');
#  print "group id is " . $groupid;
}

#   $tree->dump;

 print $tree->dump; #as_HTML;


button is Button1
button tag is button
button parent hash pointer is HTML::Element=HASH(0x19b9224)
key is _parent
key is visible
key is linkanimations
key is linkbaseobject
key is wallpaper
key is tooltiptext
key is name
key is linksize
key is linkconnections
key is isreferenceobject
key is _content
key is exposetovba
key is _tag
button _parent value is HTML::Element=HASH(0x19b9224)
button _tag value is button
parent tag is group
parent content is ARRAY(0x19b90c8)
button is Button24
button tag is button
button parent hash pointer is HTML::Element=HASH(0x19b9224)
key is _parent
key is visible
key is linkanimations
key is linkbaseobject
key is wallpaper
key is tooltiptext
key is name
key is linksize
key is linkconnections
key is isreferenceobject
key is _content
key is exposetovba
key is _tag
button _parent value is HTML::Element=HASH(0x19b9224)
button _tag value is button
parent tag is group
parent content is ARRAY(0x19b90c8)




--
REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=--
"...ne cede malis"

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: HTML Parsing Question

2010-02-04 Thread Brian Raven

From: perl-win32-users-boun...@listserv.activestate.com
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
Paul Rousseau
Sent: 04 February 2010 16:44
To: perl Win32-users
Subject: HTML Parsing Question

> Hello,
>  
>I have an HTML file that contains the following snippet:
>  
>  isReferenceObject="true" linkSize="true" linkConnections="true" 
> linkAnimations="linkWithoutExpression"
linkBaseObject="System_Bar.Group1">
>  exposeToVba="notExposed" isReferenceObject="true" linkSize="true"
linkConnections="true" 
> linkAnimations="linkWithoutExpression"
linkBaseObject="System_Bar.Button1" style="recessed" 
> captureCursor="false" highlightOnFocus="true" tabIndex="1">
>  repeatRate="0.25"/>
>  foreColor="black">
>  strikethrough="false" caption="Overviews"/>
> 
> 
>  backStyle="solid" foreColor="black">
>  underline="false" strikethrough="false" caption=""/>
> 
> 
> 
>  toolTipText="" exposeToVba="notExposed" isReferenceObject="true"
linkSize="true" 
> linkConnections="true" linkAnimations="linkWithoutExpression"
linkBaseObject="System_Bar.Button24" 
> style="recessed" captureCursor="false" highlightOnFocus="true"
tabIndex="2">
>  repeatRate="0.25"/>
>  foreColor="black">
>  strikethrough="false" caption="Oil Treating"/>
> 
> 
>  backStyle="solid" foreColor="black">
>  underline="false" strikethrough="false" caption=""/>
> 
> 
> 
>   .
>   .
>   . 
> 

I'm not an html expert, but is 'group' a valid tag? The parser seems to
be silently ignoring it. It works if you change it group to form.
 
> I want to obtain the parent name for all "button" tags. I am having
trouble extracting the parent 
> name. For example, I can find "Button1" and "Button24" but I can not
get to either item's parent name, > "Group1". 
 
> Here is my code so far. (I believe it's something trivial; I feel I am
so close.)
>  
> foreach $filename (@filenames)
>   {
> print "filename is $filename";
>$tree = HTML::TreeBuilder->new();
>$tree->parse_file("$sourcedir\\$filename");
>@Ans = $tree->look_down('_tag' => 'button');
>foreach $button (@Ans)
>  {
> print "button is " . $button->attr('name');   # prints "button
is Button1"
> print "button tag is " . $button->tag;  # can also use
${$button}{_tag}   # prints "button tag is 
> button"
> print "button parent hash pointer is " . $button->parent;   # can also
use ${$button}{_parent} # 
> prints "button parent hash pointer is HTML::Element=HASH(0x1afa5c4)"
> #
> #print Dumper($button->parent);
> #
> foreach $key (keys %{$button->parent})
>   {
>print "key is $key"; # prints four keys: _parent,
_content, _tag, _implicit
>   }
> print "button _parent value is " . ${$button}{_parent};  # prints
"button _parent value is 
> HTML::Element=HASH(0x1afa5c4)"
> print "button _tag value is " . ${$button}{_tag};  # prints "button
_tag value is button"
> 
>   $parent = $button->parent;
> print "parent tag is " . ${$parent}{_tag}; # prints "parent
tag is body"
> print "parent content is " . ${$parent}{_content};   # prints "parent
content is ARRAY(0x1afa7b0)"
> foreach $key (@{${$parent}{_content}})
>{
> #print "content item is $key";
>}
> #print "parent name is " . $parent->attr('name');
>   $groupid = $button->look_up('_tag', => 'group');
> #  print "group id is " . $groupid;
>  } Alw2ays
> #   $tree->dump;
> # print $tree->as_HTML;
>   }

Apart from your code being poorly laid out (which may well be the result
of emailing), you appear to either not be using 'use strict;' or you are
declaring variables in a larger scope than necessary. Both of which are
not recommended, if you will excuse the double negative. Always code
with 'use strict; use warnings;' at the top of your code, and declare
variables in the smallest scope necessary.

HTH

-- 
Brian Raven 
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


HTML Parsing Question

2010-02-04 Thread Paul Rousseau

Hello,

 

   I have an HTML file that contains the following snippet:

 
























  .

  .

  . 




 

I want to obtain the parent name for all "button" tags. I am having trouble 
extracting the parent name. For example, I can find "Button1" and "Button24" 
but I can not get to either item's parent name, "Group1". 

 

Here is my code so far. (I believe it's something trivial; I feel I am so 
close.)

 

foreach $filename (@filenames)
  {
print "filename is $filename";
   $tree = HTML::TreeBuilder->new();
   $tree->parse_file("$sourcedir\\$filename");

   @Ans = $tree->look_down('_tag' => 'button');
   foreach $button (@Ans)
 {

print "button is " . $button->attr('name');   # prints "button is 
Button1"
print "button tag is " . $button->tag;  # can also use ${$button}{_tag}   # 
prints "button tag is button"
print "button parent hash pointer is " . $button->parent;   # can also use 
${$button}{_parent} # prints "button parent hash pointer is 
HTML::Element=HASH(0x1afa5c4)"

#

#print Dumper($button->parent);

#

foreach $key (keys %{$button->parent})
  {
   print "key is $key"; # prints four keys: _parent, _content, 
_tag, _implicit
  }
print "button _parent value is " . ${$button}{_parent};  # prints "button 
_parent value is HTML::Element=HASH(0x1afa5c4)"
print "button _tag value is " . ${$button}{_tag};  # prints "button _tag value 
is button"


  $parent = $button->parent;
print "parent tag is " . ${$parent}{_tag}; # prints "parent tag is body"
print "parent content is " . ${$parent}{_content};   # prints "parent content 
is ARRAY(0x1afa7b0)"
foreach $key (@{${$parent}{_content}})
   {
#print "content item is $key";
   }

#print "parent name is " . $parent->attr('name');

  $groupid = $button->look_up('_tag', => 'group');
#  print "group id is " . $groupid;
 }

#   $tree->dump;

# print $tree->as_HTML;

  }


 

 

Any help would be appreciated.

 

Thank you

 

Paul Rousseau

 
  
_
Check your Hotmail from your phone.
http://go.microsoft.com/?linkid=9708121___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs