RE: html tree question. clumsy ?
Thanks for your info Rob. But look at what you wrote: "I'm pretty sure", "I understand it to mean", "I believe"... My comments and solutions are based on my _actually_writing_code_ to try to do the things you muse about, and _it_did_not_work_. Don't take this the wrong way Rob, I just want to make things clear for other people reading this who might run into the same problem and/or be inclined to try it out. - - Martin > -Original Message- > From: Rob Dixon [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 23, 2003 5:27 PM > To: [EMAIL PROTECTED] > Subject: Re: html tree question. clumsy ? > > > Martin Thurn wrote: > > I ran into similar problems for my module WWW::Search. > > No, out-of-the-box you can not re-use an > HTML::TreeBuilder object to parse > > a new file. > > I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses > HTML::Parser, which provides the 'new', 'parse', > 'parse_file', and 'eof' > methods. The documentation says: > > After $p->eof has been called, the parse() and > parse_file() methods > can be invoked to feed new documents with the parser object. > > which is poor English, but I understand it to mean that, once > the 'eof' > method has been called, any further calls to 'parse' or 'parse_file' > will create a new HTML tree from scratch. > > > BUT you can use the following code as a "reset". I.e. call > > parse, muck with the tree, do the following four lines, and > call parse > > again. This does the same as new() but without changing > the store_comments, > > store_pis settings, etc: > > > > $self->{'_head'} = $self->insert_element('head',1); > > $self->{'_pos'} = undef; # pull it back up > > $self->{'_body'} = $self->insert_element('body',1); > > $self->{'_pos'} = undef; # pull it back up again > > HTML::Parser will itself insert any implicit , and > tags when further input is parsed. > > > The reason you can't re-use your HTML::Element is because it's a > > reference, and when the tree gets deleted, your Element > gets deleted right > > along with it. > > Once you have called $tree->delete the object no longer exists, but I > believe $tree->delete_content or $tree->eof will allow you to reuse > the same object for parsing a new document. > > Rob > >
Re: html tree question. clumsy ?
At 09:42 AM 2003-09-23, James.Q.L wrote: 1. will $tree->parse_file(parse a new file) will overwrite the old parsed $tree content? No, don't try that. You can call parse_file on a $tree only once. As far as I remember, there's no re-using it. so that i dont have to delete the tree in the sub? Why not just delete the tree? As to your question here: ## why can't keep it outside ?/ my $literal = HTML::Element->new('~literal','text' => $insert); I think because once you insert the $literal object into a tree, then when you later destroy the tree, that deletes that $literal object, too. (Deleting a tree basically means deleting every attribute in every node in the tree.) -- Sean M. Burkehttp://search.cpan.org/~sburke/
Re: html tree question. clumsy ?
--- Rob Dixon <[EMAIL PROTECTED]> wrote: > Martin Thurn wrote: > > I ran into similar problems for my module WWW::Search. > > No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse > > a new file. > > I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses > HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof' > methods. The documentation says: > > After $p->eof has been called, the parse() and parse_file() methods > can be invoked to feed new documents with the parser object. > > which is poor English, but I understand it to mean that, once the 'eof' > method has been called, any further calls to 'parse' or 'parse_file' > will create a new HTML tree from scratch. but from HTML::TreeBuilder, " $root->eof() This signals that you're finished parsing content into this tree; this runs various kinds of crucial cleanup on the tree. This is called for you when you call $root->parse_file(...), but not when you call $root->parse(...). So if you call $root->parse(...), then you must call $root->eof() once you've finished feeding all the chunks to parse(...), and before you actually start doing anything else with the tree in $root." it said 'This is called for you when you call $root->parse_file'. > > The reason you can't re-use your HTML::Element is because it's a > > reference, and when the tree gets deleted, your Element gets deleted right > > along with it. > > Once you have called $tree->delete the object no longer exists, but I > believe $tree->delete_content or $tree->eof will allow you to reuse > the same object for parsing a new document. > > Rob > i tried both delete_content and eof, it either yield error or the first file being parsed is written to the rest of parsing file. something like this define new tree $tree in new sub, parse file with $tree, do something , then called $tree->delete_content or/and $tree->eof Qiang __ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Re: html tree question. clumsy ?
James.Q.L wrote: > > Thurn, Martin wrote: > > > > I ran into similar problems for my module WWW::Search. > > No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse > > a new file. BUT you can use the following code as a "reset". I.e. call > > parse, muck with the tree, do the following four lines, and call parse > > again. This does the same as new() but without changing the store_comments, > > store_pis settings, etc: > > > > $self->{'_head'} = $self->insert_element('head',1); > > $self->{'_pos'} = undef; # pull it back up > > $self->{'_body'} = $self->insert_element('body',1); > > $self->{'_pos'} = undef; # pull it back up again > > i will look into how that can be added to $tree later. > but a reset method from HTML::TreeBuilder would be nice. As per my previous post, I think $tree->eof or $tree->delete_content will do the trick. > > The reason you can't re-use your HTML::Element is because it's a > > reference, and when the tree gets deleted, your Element gets deleted right > > along with it. > > > > - - Martin Thurn > > > > just found that $h->clone can get around this. too bad i can't do the same to a tree > before it > parses the file. > > from HTML::Element, > > You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done > being parsed, or > 2) you don't expect to resume parsing into the clone. (You can continue parsing into > the original; > it is never affected.) > > just to make sure, (english isn't my first language) does it say that i can't clone > a tree if it > doesn't parse something? No. What this is saying is that you can't do: my $tree = new HTML::TreeBuilder; $tree->parse($string1); $tree->parse($string2); my $tree2 = $tree->clone; $tree2->parse($string3); But, here, you could write $tree->parse($string4); In other words, you can't carry on parsing HTML text into a cloned tree, but the original tree is unaffected. HTH, Rob
Re: html tree question. clumsy ?
Martin Thurn wrote: > I ran into similar problems for my module WWW::Search. > No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse > a new file. I'm pretty sure you're wrong about that. HTML::TreeBuilder subclasses HTML::Parser, which provides the 'new', 'parse', 'parse_file', and 'eof' methods. The documentation says: After $p->eof has been called, the parse() and parse_file() methods can be invoked to feed new documents with the parser object. which is poor English, but I understand it to mean that, once the 'eof' method has been called, any further calls to 'parse' or 'parse_file' will create a new HTML tree from scratch. > BUT you can use the following code as a "reset". I.e. call > parse, muck with the tree, do the following four lines, and call parse > again. This does the same as new() but without changing the store_comments, > store_pis settings, etc: > > $self->{'_head'} = $self->insert_element('head',1); > $self->{'_pos'} = undef; # pull it back up > $self->{'_body'} = $self->insert_element('body',1); > $self->{'_pos'} = undef; # pull it back up again HTML::Parser will itself insert any implicit , and tags when further input is parsed. > The reason you can't re-use your HTML::Element is because it's a > reference, and when the tree gets deleted, your Element gets deleted right > along with it. Once you have called $tree->delete the object no longer exists, but I believe $tree->delete_content or $tree->eof will allow you to reuse the same object for parsing a new document. Rob
RE: html tree question. clumsy ?
--- "Thurn, Martin" <[EMAIL PROTECTED]> wrote: > I ran into similar problems for my module WWW::Search. > No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse > a new file. BUT you can use the following code as a "reset". I.e. call > parse, muck with the tree, do the following four lines, and call parse > again. This does the same as new() but without changing the store_comments, > store_pis settings, etc: > > $self->{'_head'} = $self->insert_element('head',1); > $self->{'_pos'} = undef; # pull it back up > $self->{'_body'} = $self->insert_element('body',1); > $self->{'_pos'} = undef; # pull it back up again i will look into how that can be added to $tree later. but a reset method from HTML::TreeBuilder would be nice. > The reason you can't re-use your HTML::Element is because it's a > reference, and when the tree gets deleted, your Element gets deleted right > along with it. > > - - Martin Thurn > just found that $h->clone can get around this. too bad i can't do the same to a tree before it parses the file. from HTML::Element, You are free to clone HTML::TreeBuilder trees, just as long as: 1) they're done being parsed, or 2) you don't expect to resume parsing into the clone. (You can continue parsing into the original; it is never affected.) just to make sure, (english isn't my first language) does it say that i can't clone a tree if it doesn't parse something? Qiang __ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
RE: html tree question. clumsy ?
I ran into similar problems for my module WWW::Search. No, out-of-the-box you can not re-use an HTML::TreeBuilder object to parse a new file. BUT you can use the following code as a "reset". I.e. call parse, muck with the tree, do the following four lines, and call parse again. This does the same as new() but without changing the store_comments, store_pis settings, etc: $self->{'_head'} = $self->insert_element('head',1); $self->{'_pos'} = undef; # pull it back up $self->{'_body'} = $self->insert_element('body',1); $self->{'_pos'} = undef; # pull it back up again The reason you can't re-use your HTML::Element is because it's a reference, and when the tree gets deleted, your Element gets deleted right along with it. - - Martin Thurn