RE: html tree question. clumsy ?

2003-09-25 Thread Thurn, Martin
  Thanks for your info Rob.  But look at what you wrote: I'm pretty sure,
I understand it to mean, I believe...
  My comments and solutions are based on my _actually_writing_code_ to try
to do the things you muse about, and _it_did_not_work_.  
  Don't take this the wrong way Rob, I just want to make things clear for
other people reading this who might run into the same problem and/or be
inclined to try it out.

 - - Martin

 -Original Message-
 From: Rob Dixon [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, September 23, 2003 5:27 PM
 To: [EMAIL PROTECTED]
 Subject: Re: html tree question. clumsy ?
 
 
 Martin Thurn wrote:
I ran into similar problems for my module WWW::Search.
No, out-of-the-box you can not re-use an 
 HTML::TreeBuilder object to parse
a new file.
 
 I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses
 HTML::Parser, which provides the 'new', 'parse', 
 'parse_file', and 'eof'
 methods. The documentation says:
 
 After $p-eof has been called, the parse() and 
 parse_file() methods
 can be invoked to feed new documents with the parser object.
 
 which is poor English, but I understand it to mean that, once 
 the 'eof'
 method has been called, any further calls to 'parse' or 'parse_file'
 will create a new HTML tree from scratch.
 
  BUT you can use the following code as a reset. I.e. call
  parse, muck with the tree, do the following four lines, and 
 call parse
  again.  This does the same as new() but without changing 
 the store_comments,
  store_pis settings, etc:
 
$self-{'_head'} = $self-insert_element('head',1);
$self-{'_pos'} = undef;  # pull it back up
$self-{'_body'} = $self-insert_element('body',1);
$self-{'_pos'} = undef;  # pull it back up again
 
 HTML::Parser will itself insert any implicit html, head and body
 tags when further input is parsed.
 
The reason you can't re-use your HTML::Element is because it's a
  reference, and when the tree gets deleted, your Element 
 gets deleted right
  along with it.
 
 Once you have called $tree-delete the object no longer exists, but I
 believe $tree-delete_content or $tree-eof will allow you to reuse
 the same object for parsing a new document.
 
 Rob
 
 


Re: html tree question. clumsy ?

2003-09-24 Thread Rob Dixon
Martin Thurn wrote:
   I ran into similar problems for my module WWW::Search.
   No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
   a new file.

I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses
HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof'
methods. The documentation says:

After $p-eof has been called, the parse() and parse_file() methods
can be invoked to feed new documents with the parser object.

which is poor English, but I understand it to mean that, once the 'eof'
method has been called, any further calls to 'parse' or 'parse_file'
will create a new HTML tree from scratch.

 BUT you can use the following code as a reset. I.e. call
 parse, muck with the tree, do the following four lines, and call parse
 again.  This does the same as new() but without changing the store_comments,
 store_pis settings, etc:

   $self-{'_head'} = $self-insert_element('head',1);
   $self-{'_pos'} = undef;  # pull it back up
   $self-{'_body'} = $self-insert_element('body',1);
   $self-{'_pos'} = undef;  # pull it back up again

HTML::Parser will itself insert any implicit html, head and body
tags when further input is parsed.

   The reason you can't re-use your HTML::Element is because it's a
 reference, and when the tree gets deleted, your Element gets deleted right
 along with it.

Once you have called $tree-delete the object no longer exists, but I
believe $tree-delete_content or $tree-eof will allow you to reuse
the same object for parsing a new document.

Rob




Re: html tree question. clumsy ?

2003-09-24 Thread Rob Dixon
James.Q.L wrote:

 Thurn, Martin wrote:
 
I ran into similar problems for my module WWW::Search.
No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
  a new file.  BUT you can use the following code as a reset. I.e. call
  parse, muck with the tree, do the following four lines, and call parse
  again.  This does the same as new() but without changing the store_comments,
  store_pis settings, etc:
 
$self-{'_head'} = $self-insert_element('head',1);
$self-{'_pos'} = undef;  # pull it back up
$self-{'_body'} = $self-insert_element('body',1);
$self-{'_pos'} = undef;  # pull it back up again

 i will look into how that can be added to $tree later.
 but a reset method from HTML::TreeBuilder would be nice.

As per my previous post, I think $tree-eof or $tree-delete_content will
do the trick.

The reason you can't re-use your HTML::Element is because it's a
  reference, and when the tree gets deleted, your Element gets deleted right
  along with it.
 
   - - Martin Thurn
 

 just found that $h-clone can get around this. too bad i can't do the same to a tree 
 before it
 parses the file.

 from HTML::Element,

 You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done 
 being parsed, or
 2) you don't expect to resume parsing into the clone. (You can continue parsing into 
 the original;
 it is never affected.)

 just to make sure, (english isn't my first language) does it say that i can't clone 
 a tree if it
 doesn't parse something?

No. What this is saying is that you can't do:

  my $tree = new HTML::TreeBuilder;
  $tree-parse($string1);
  $tree-parse($string2);

  my $tree2 = $tree-clone;
  $tree2-parse($string3);

But, here, you could write

  $tree-parse($string4);

In other words, you can't carry on parsing HTML text into a cloned
tree, but the original tree is unaffected.

HTH,

Rob




Re: html tree question. clumsy ?

2003-09-24 Thread Sean M. Burke
At 09:42 AM 2003-09-23, James.Q.L wrote:
1. will $tree-parse_file(parse a new file) will overwrite the old parsed 
$tree content?
No, don't try that.  You can call parse_file on a $tree only once.  As far 
as I remember, there's no re-using it.

so that i dont have to delete the tree in the sub?
Why not just delete the tree?

As to your question here:

## why can't keep it outside ?/
my $literal = HTML::Element-new('~literal','text' = $insert);
I think because once you insert the $literal object into a tree, then when 
you later destroy the tree, that deletes that $literal object, too.
(Deleting a tree basically means deleting every attribute in every node in 
the tree.)

--
Sean M. Burkehttp://search.cpan.org/~sburke/


RE: html tree question. clumsy ?

2003-09-23 Thread Thurn, Martin
  I ran into similar problems for my module WWW::Search.
  No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
a new file.  BUT you can use the following code as a reset. I.e. call
parse, muck with the tree, do the following four lines, and call parse
again.  This does the same as new() but without changing the store_comments,
store_pis settings, etc:

  $self-{'_head'} = $self-insert_element('head',1);
  $self-{'_pos'} = undef;  # pull it back up
  $self-{'_body'} = $self-insert_element('body',1);
  $self-{'_pos'} = undef;  # pull it back up again

  The reason you can't re-use your HTML::Element is because it's a
reference, and when the tree gets deleted, your Element gets deleted right
along with it.

 - - Martin Thurn



RE: html tree question. clumsy ?

2003-09-23 Thread James.Q.L

--- Thurn, Martin [EMAIL PROTECTED] wrote:
   I ran into similar problems for my module WWW::Search.
   No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
 a new file.  BUT you can use the following code as a reset. I.e. call
 parse, muck with the tree, do the following four lines, and call parse
 again.  This does the same as new() but without changing the store_comments,
 store_pis settings, etc:
 
   $self-{'_head'} = $self-insert_element('head',1);
   $self-{'_pos'} = undef;  # pull it back up
   $self-{'_body'} = $self-insert_element('body',1);
   $self-{'_pos'} = undef;  # pull it back up again

i will look into how that can be added to $tree later.
but a reset method from HTML::TreeBuilder would be nice.
 
   The reason you can't re-use your HTML::Element is because it's a
 reference, and when the tree gets deleted, your Element gets deleted right
 along with it.
 
  - - Martin Thurn
 

just found that $h-clone can get around this. too bad i can't do the same to a tree 
before it
parses the file. 

from HTML::Element,

You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done being 
parsed, or
2) you don't expect to resume parsing into the clone. (You can continue parsing into 
the original;
it is never affected.)

just to make sure, (english isn't my first language) does it say that i can't clone a 
tree if it
doesn't parse something?


Qiang

__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com