RE: html tree question. clumsy ?

2003-09-25 Thread Thurn, Martin
  Thanks for your info Rob.  But look at what you wrote: "I'm pretty sure",
"I understand it to mean", "I believe"...
  My comments and solutions are based on my _actually_writing_code_ to try
to do the things you muse about, and _it_did_not_work_.  
  Don't take this the wrong way Rob, I just want to make things clear for
other people reading this who might run into the same problem and/or be
inclined to try it out.

 - - Martin

> -Original Message-
> From: Rob Dixon [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 23, 2003 5:27 PM
> To: [EMAIL PROTECTED]
> Subject: Re: html tree question. clumsy ?
> 
> 
> Martin Thurn wrote:
> >   I ran into similar problems for my module WWW::Search.
> >   No, out-of-the-box you can not re-use an 
> HTML::TreeBuilder object to parse
> >   a new file.
> 
> I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses
> HTML::Parser, which provides the 'new', 'parse', 
> 'parse_file', and 'eof'
> methods. The documentation says:
> 
> After $p->eof has been called, the parse() and 
> parse_file() methods
> can be invoked to feed new documents with the parser object.
> 
> which is poor English, but I understand it to mean that, once 
> the 'eof'
> method has been called, any further calls to 'parse' or 'parse_file'
> will create a new HTML tree from scratch.
> 
> > BUT you can use the following code as a "reset". I.e. call
> > parse, muck with the tree, do the following four lines, and 
> call parse
> > again.  This does the same as new() but without changing 
> the store_comments,
> > store_pis settings, etc:
> >
> >   $self->{'_head'} = $self->insert_element('head',1);
> >   $self->{'_pos'} = undef;  # pull it back up
> >   $self->{'_body'} = $self->insert_element('body',1);
> >   $self->{'_pos'} = undef;  # pull it back up again
> 
> HTML::Parser will itself insert any implicit ,  and 
> tags when further input is parsed.
> 
> >   The reason you can't re-use your HTML::Element is because it's a
> > reference, and when the tree gets deleted, your Element 
> gets deleted right
> > along with it.
> 
> Once you have called $tree->delete the object no longer exists, but I
> believe $tree->delete_content or $tree->eof will allow you to reuse
> the same object for parsing a new document.
> 
> Rob
> 
> 


Re: html tree question. clumsy ?

2003-09-24 Thread Sean M. Burke
At 09:42 AM 2003-09-23, James.Q.L wrote:
1. will $tree->parse_file(parse a new file) will overwrite the old parsed 
$tree content?
No, don't try that.  You can call parse_file on a $tree only once.  As far 
as I remember, there's no re-using it.

so that i dont have to delete the tree in the sub?
Why not just delete the tree?

As to your question here:

## why can't keep it outside ?/
my $literal = HTML::Element->new('~literal','text' => $insert);
I think because once you insert the $literal object into a tree, then when 
you later destroy the tree, that deletes that $literal object, too.
(Deleting a tree basically means deleting every attribute in every node in 
the tree.)

--
Sean M. Burkehttp://search.cpan.org/~sburke/


Re: html tree question. clumsy ?

2003-09-24 Thread James.Q.L

--- Rob Dixon <[EMAIL PROTECTED]> wrote:
> Martin Thurn wrote:
> >   I ran into similar problems for my module WWW::Search.
> >   No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
> >   a new file.
> 
> I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses
> HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof'
> methods. The documentation says:
> 
> After $p->eof has been called, the parse() and parse_file() methods
> can be invoked to feed new documents with the parser object.
> 
> which is poor English, but I understand it to mean that, once the 'eof'
> method has been called, any further calls to 'parse' or 'parse_file'
> will create a new HTML tree from scratch.

but from HTML::TreeBuilder,
 
"
$root->eof() 
This signals that you're finished parsing content into this tree; this runs various 
kinds of
crucial cleanup on the tree. This is called for you when you call 
$root->parse_file(...), but not
when you call $root->parse(...). So if you call $root->parse(...), then you must call 
$root->eof()
once you've finished feeding all the chunks to parse(...), and before you actually 
start doing
anything else with the tree in $root."

it said 'This is called for you when you call $root->parse_file'.

> >   The reason you can't re-use your HTML::Element is because it's a
> > reference, and when the tree gets deleted, your Element gets deleted right
> > along with it.
> 
> Once you have called $tree->delete the object no longer exists, but I
> believe $tree->delete_content or $tree->eof will allow you to reuse
> the same object for parsing a new document.
> 
> Rob
> 

i tried both delete_content and eof, it either yield error or the first file being 
parsed is
written to the rest of parsing file. something like this

define new tree $tree
in new sub, parse file with $tree, do something , then called $tree->delete_content 
or/and
$tree->eof


Qiang

 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com


Re: html tree question. clumsy ?

2003-09-24 Thread Rob Dixon
James.Q.L wrote:
>
> Thurn, Martin wrote:
> >
> >   I ran into similar problems for my module WWW::Search.
> >   No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
> > a new file.  BUT you can use the following code as a "reset". I.e. call
> > parse, muck with the tree, do the following four lines, and call parse
> > again.  This does the same as new() but without changing the store_comments,
> > store_pis settings, etc:
> >
> >   $self->{'_head'} = $self->insert_element('head',1);
> >   $self->{'_pos'} = undef;  # pull it back up
> >   $self->{'_body'} = $self->insert_element('body',1);
> >   $self->{'_pos'} = undef;  # pull it back up again
>
> i will look into how that can be added to $tree later.
> but a reset method from HTML::TreeBuilder would be nice.

As per my previous post, I think $tree->eof or $tree->delete_content will
do the trick.

> >   The reason you can't re-use your HTML::Element is because it's a
> > reference, and when the tree gets deleted, your Element gets deleted right
> > along with it.
> >
> >  - - Martin Thurn
> >
>
> just found that $h->clone can get around this. too bad i can't do the same to a tree 
> before it
> parses the file.
>
> from HTML::Element,
>
> You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done 
> being parsed, or
> 2) you don't expect to resume parsing into the clone. (You can continue parsing into 
> the original;
> it is never affected.)
>
> just to make sure, (english isn't my first language) does it say that i can't clone 
> a tree if it
> doesn't parse something?

No. What this is saying is that you can't do:

  my $tree = new HTML::TreeBuilder;
  $tree->parse($string1);
  $tree->parse($string2);

  my $tree2 = $tree->clone;
  $tree2->parse($string3);

But, here, you could write

  $tree->parse($string4);

In other words, you can't carry on parsing HTML text into a cloned
tree, but the original tree is unaffected.

HTH,

Rob




Re: html tree question. clumsy ?

2003-09-24 Thread Rob Dixon
Martin Thurn wrote:
>   I ran into similar problems for my module WWW::Search.
>   No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
>   a new file.

I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses
HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof'
methods. The documentation says:

After $p->eof has been called, the parse() and parse_file() methods
can be invoked to feed new documents with the parser object.

which is poor English, but I understand it to mean that, once the 'eof'
method has been called, any further calls to 'parse' or 'parse_file'
will create a new HTML tree from scratch.

> BUT you can use the following code as a "reset". I.e. call
> parse, muck with the tree, do the following four lines, and call parse
> again.  This does the same as new() but without changing the store_comments,
> store_pis settings, etc:
>
>   $self->{'_head'} = $self->insert_element('head',1);
>   $self->{'_pos'} = undef;  # pull it back up
>   $self->{'_body'} = $self->insert_element('body',1);
>   $self->{'_pos'} = undef;  # pull it back up again

HTML::Parser will itself insert any implicit ,  and 
tags when further input is parsed.

>   The reason you can't re-use your HTML::Element is because it's a
> reference, and when the tree gets deleted, your Element gets deleted right
> along with it.

Once you have called $tree->delete the object no longer exists, but I
believe $tree->delete_content or $tree->eof will allow you to reuse
the same object for parsing a new document.

Rob




RE: html tree question. clumsy ?

2003-09-23 Thread James.Q.L

--- "Thurn, Martin" <[EMAIL PROTECTED]> wrote:
>   I ran into similar problems for my module WWW::Search.
>   No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
> a new file.  BUT you can use the following code as a "reset". I.e. call
> parse, muck with the tree, do the following four lines, and call parse
> again.  This does the same as new() but without changing the store_comments,
> store_pis settings, etc:
> 
>   $self->{'_head'} = $self->insert_element('head',1);
>   $self->{'_pos'} = undef;  # pull it back up
>   $self->{'_body'} = $self->insert_element('body',1);
>   $self->{'_pos'} = undef;  # pull it back up again

i will look into how that can be added to $tree later.
but a reset method from HTML::TreeBuilder would be nice.
 
>   The reason you can't re-use your HTML::Element is because it's a
> reference, and when the tree gets deleted, your Element gets deleted right
> along with it.
> 
>  - - Martin Thurn
> 

just found that $h->clone can get around this. too bad i can't do the same to a tree 
before it
parses the file. 

from HTML::Element,

You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done being 
parsed, or
2) you don't expect to resume parsing into the clone. (You can continue parsing into 
the original;
it is never affected.)

just to make sure, (english isn't my first language) does it say that i can't clone a 
tree if it
doesn't parse something?


Qiang

__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com


RE: html tree question. clumsy ?

2003-09-23 Thread Thurn, Martin
  I ran into similar problems for my module WWW::Search.
  No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse
a new file.  BUT you can use the following code as a "reset". I.e. call
parse, muck with the tree, do the following four lines, and call parse
again.  This does the same as new() but without changing the store_comments,
store_pis settings, etc:

  $self->{'_head'} = $self->insert_element('head',1);
  $self->{'_pos'} = undef;  # pull it back up
  $self->{'_body'} = $self->insert_element('body',1);
  $self->{'_pos'} = undef;  # pull it back up again

  The reason you can't re-use your HTML::Element is because it's a
reference, and when the tree gets deleted, your Element gets deleted right
along with it.

 - - Martin Thurn